The present disclosure relates generally to artificial intelligence and, more particularly, to the early detection of prompt injection attacks using semantic analysis.
The recent breakthroughs in large language models (LLMs), such as ChatGPT and GPT-4, represent new opportunities across a wide spectrum of industries. More specifically, the ability of these models to follow instructions now allow for interactions with tools (also called plugins) that are able to perform tasks such as searching the web, executing code, etc. In addition, agents can be written to perform complex tasks by chaining multiple calls to one or more LLMs.
While LLMs are quite capable of performing a myriad of tasks, they are subject to a new type of attack known as a prompt injection attack. During such an attack, a malicious actor submits a prompt for input to the LLM that includes text intended to exploit the LLM in a way that does not appear to be malicious to the LLM, but causes the LLM to carry out malicious behavior (e.g., disclosing sensitive information, performing restricted actions, etc.).
The implementations herein may be better understood by referring to the following description in conjunction with the accompanying drawings in which like reference numerals indicate identically or functionally similar elements, of which:
According to one or more implementations of the disclosure, a device may obtain a prompt for input to a language model. The device identifies a plurality of topics present in the prompt. The device determines that the prompt is malicious based on a variation in the plurality of topics. The device prevents the prompt from being processed by the language model.
A computer network is a geographically distributed collection of nodes interconnected by communication links and segments for transporting data between end nodes, such as personal computers and workstations, or other devices, such as sensors, etc. Many types of networks are available, ranging from local area networks (LANs) to wide area networks (WANs). LANs typically connect the nodes over dedicated private communications links located in the same general physical location, such as a building or campus. WANs, on the other hand, typically connect geographically dispersed nodes over long-distance communications links, such as common carrier telephone lines, optical lightpaths, synchronous optical networks (SONET), synchronous digital hierarchy (SDH) links, and others. Other types of networks, such as field area networks (FANs), neighborhood area networks (NANs), personal area networks (PANs), etc. may also make up the components of any given computer network.
In various implementations, computer networks may include an Internet of Things network. Loosely, the term “Internet of Things” or “IoT” (or “Internet of Everything” or “IoE”) refers to uniquely identifiable objects (things) and their virtual representations in a network-based architecture. In particular, the IoT involves the ability to connect more than just computers and communications devices, but rather the ability to connect “objects” in general, such as lights, appliances, vehicles, heating, ventilating, and air-conditioning (HVAC), windows and window shades and blinds, doors, locks, etc. The “Internet of Things” thus generally refers to the interconnection of objects (e.g., smart objects), such as sensors and actuators, over a computer network (e.g., via IP), which may be the public Internet or a private network.
Often, IoT networks operate within a shared-media mesh networks, such as wireless or wired networks, etc., and are often on what is referred to as Low-Power and Lossy Networks (LLNs), which are a class of network in which both the routers and their interconnect are constrained. That is, LLN devices/routers typically operate with constraints, e.g., processing power, memory, and/or energy (battery), and their interconnects are characterized by, illustratively, high loss rates, low data rates, and/or instability. IoT networks are comprised of anything from a few dozen to thousands or even millions of devices, and support point-to-point traffic (between devices inside the network), point-to-multipoint traffic (from a central control point such as a root node to a subset of devices inside the network), and multipoint-to-point traffic (from devices inside the network towards a central control point).
Edge computing, also sometimes referred to as “fog” computing, is a distributed approach of cloud implementation that acts as an intermediate layer from local networks (e.g., IoT networks) to the cloud (e.g., centralized and/or shared resources, as will be understood by those skilled in the art). That is, generally, edge computing entails using devices at the network edge to provide application services, including computation, networking, and storage, to the local nodes in the network, in contrast to cloud-based approaches that rely on remote data centers/cloud environments for the services. To this end, an edge node is a functional node that is deployed close to IoT endpoints to provide computing, storage, and networking resources and services. Multiple edge nodes organized or configured together form an edge compute system, to implement a particular solution. Edge nodes and edge systems can have the same or complementary capabilities, in various implementations. That is, each individual edge node does not have to implement the entire spectrum of capabilities. Instead, the edge capabilities may be distributed across multiple edge nodes and systems, which may collaborate to help each other to provide the desired services. In other words, an edge system can include any number of virtualized services and/or data stores that are spread across the distributed edge nodes. This may include a master-slave configuration, publish-subscribe configuration, or peer-to-peer configuration.
Low power and Lossy Networks (LLNs), e.g., certain sensor networks, may be used in a myriad of applications such as for “Smart Grid” and “Smart Cities.” A number of challenges in LLNs have been presented, such as:
In other words, LLNs are a class of network in which both the routers and their interconnect are constrained: LLN routers typically operate with constraints, e.g., processing power, memory, and/or energy (battery), and their interconnects are characterized by, illustratively, high loss rates, low data rates, and/or instability. LLNs are comprised of anything from a few dozen and up to thousands or even millions of LLN routers, and support point-to-point traffic (between devices inside the LLN), point-to-multipoint traffic (from a central control point to a subset of devices inside the LLN) and multipoint-to-point traffic (from devices inside the LLN towards a central control point).
An example implementation of LLNs is an “Internet of Things” network. Loosely, the term “Internet of Things” or “IoT” may be used by those in the art to refer to uniquely identifiable objects (things) and their virtual representations in a network-based architecture. In particular, the next frontier in the evolution of the Internet is the ability to connect more than just computers and communications devices, but rather the ability to connect “objects” in general, such as lights, appliances, vehicles, HVAC (heating, ventilating, and air-conditioning), windows and window shades and blinds, doors, locks, etc. The “Internet of Things” thus generally refers to the interconnection of objects (e.g., smart objects), such as sensors and actuators, over a computer network (e.g., IP), which may be the Public Internet or a private network. Such devices have been used in the industry for decades, usually in the form of non-IP or proprietary protocols that are connected to IP networks by way of protocol translation gateways. With the emergence of a myriad of applications, such as the smart grid advanced metering infrastructure (AMI), smart cities, and building and industrial automation, and cars (e.g., that can interconnect millions of objects for sensing things like power quality, tire pressure, and temperature and that can actuate engines and lights), it has been of the utmost importance to extend the IP protocol suite for these networks.
Specifically, as shown in the example IoT network 100, three illustrative layers are shown, namely cloud layer 110, edge layer 120, and IoT device layer 130. Illustratively, the cloud layer 110 may comprise general connectivity via the Internet 112, and may contain one or more datacenters 114 with one or more centralized servers 116 or other devices, as will be appreciated by those skilled in the art. Within the edge layer 120, various edge devices 122 may perform various data processing functions locally, as opposed to datacenter/cloud-based servers or on the endpoint IoT nodes 132 themselves of IoT device layer 130. For example, edge devices 122 may include edge routers and/or other networking devices that provide connectivity between cloud layer 110 and IoT device layer 130. Data packets (e.g., traffic and/or messages sent between the devices/nodes) may be exchanged among the nodes/devices of the computer network 100 using predefined network communication protocols such as certain known wired protocols, wireless protocols, or other shared-media protocols where appropriate. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Those skilled in the art will understand that any number of nodes, devices, links, etc. may be used in the computer network, and that the view shown herein is for simplicity. Also, those skilled in the art will further understand that while the network is shown in a certain orientation, the network 100 is merely an example illustration that is not meant to limit the disclosure.
Data packets (e.g., traffic and/or messages) may be exchanged among the nodes/devices of the computer network 100 using predefined network communication protocols such as certain known wired protocols, wireless protocols (e.g., IEEE Std. 802.15.4, Wi-Fi, Bluetooth®, DECT-Ultra Low Energy, LoRa, etc.), or other shared-media protocols where appropriate. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
The network interfaces 210 include the mechanical, electrical, and signaling circuitry for communicating data over physical links coupled to the network 100. The network interfaces may be configured to transmit and/or receive data using a variety of different communication protocols. Notably, a physical network interface 210 may also be used to implement one or more virtual network interfaces, such as for virtual private network (VPN) access, known to those skilled in the art.
The memory 240 comprises a plurality of storage locations that are addressable by the processor(s) 220 and the network interfaces 210 for storing software programs and data structures associated with the implementations described herein. The processor 220 may comprise necessary elements or logic adapted to execute the software programs and manipulate the data structures 245. An operating system 242 (e.g., the Internetworking Operating System, or IOS®, of Cisco Systems, Inc., another operating system, etc.), portions of which are typically resident in memory 240 and executed by the processor(s), functionally organizes the node by, inter alia, invoking network operations in support of software processors and/or services executing on the device. These software components may comprise a as described herein, any of which may alternatively be located within individual network interfaces.
It will be apparent to those skilled in the art that other processor and memory types, including various computer-readable media, may be used to store and execute program instructions pertaining to the techniques described herein. Also, while the description illustrates various processes, it is expressly contemplated that various processes may be embodied as modules configured to operate in accordance with the techniques herein (e.g., according to the functionality of a similar process). Further, while processes may be shown and/or described separately, those skilled in the art will appreciate that processes may be routines or modules within other processes.
In various implementations, as detailed further below, prompt analysis process 248 may include computer executable instructions that, when executed by processor(s) 220, cause device 200 to perform the techniques described herein. To do so, in some implementations, prompt analysis process 248 may utilize machine learning. In general, machine learning is concerned with the design and the development of techniques that take as input empirical data (such as network statistics and performance indicators), and recognize complex patterns in these data. One very common pattern among machine learning techniques is the use of an underlying model M, whose parameters are optimized for minimizing the cost function associated to M, given the input data. For instance, in the context of classification, the model M may be a straight line that separates the data into two classes (e.g., labels) such that M=a*x+b*y+c and the cost function would be the number of misclassified points. The learning process then operates by adjusting the parameters a,b,c such that the number of misclassified points is minimal. After this optimization phase (or learning phase), the model M can be used very easily to classify new data points. Often, M is a statistical model, and the cost function is inversely proportional to the likelihood of M, given the input data.
In various implementations, prompt analysis process 248 may employ one or more supervised, unsupervised, or semi-supervised machine learning models. Generally, supervised learning entails the use of a training set of data, as noted above, that is used to train the model to apply labels to the input data. For example, the training data may include sample telemetry that has been labeled as being indicative of an acceptable performance or unacceptable performance. On the other end of the spectrum are unsupervised techniques that do not require a training set of labels. Notably, while a supervised learning model may look for previously seen patterns that have been labeled as such, an unsupervised model may instead look to whether there are sudden changes or patterns in the behavior of the metrics. Semi-supervised learning models take a middle ground approach that uses a greatly reduced set of labeled training data.
Example machine learning techniques that prompt analysis process 248 can employ may include, but are not limited to, nearest neighbor (NN) techniques (e.g., k-NN models, replicator NN models, etc.), statistical techniques (e.g., Bayesian networks, etc.), clustering techniques (e.g., k-means, mean-shift, etc.), neural networks (e.g., reservoir networks, recurrent neural networks, etc.), support vector machines (SVMs), generative adversarial networks (GANs), long short-term memory (LSTM), logistic or other regression, Markov models or chains, principal component analysis (PCA) (e.g., for linear models), singular value decomposition (SVD), multi-layer perceptron (MLP) artificial neural networks (ANNs) (e.g., for non-linear models), transformer-based models (e.g., BERT), replicating reservoir networks (e.g., for non-linear models, typically for timeseries), random forest classification, or the like.
In further implementations, prompt analysis process 248 may also include one or more generative artificial intelligence/machine learning models. In contrast to discriminative models that simply seek to perform pattern matching for purposes such as anomaly detection, classification, or the like, generative approaches instead seek to generate new content or other data (e.g., audio, video/images, text, etc.), based on an existing body of training data. For instance, in the context of network assurance, prompt analysis process 248 may use a generative model to generate synthetic network traffic based on existing user traffic to test how the network reacts. Example generative approaches can include, but are not limited to, generative adversarial networks (GANs), large language models (LLMs), other transformer models, and the like.
As noted above, the recent breakthroughs in large language models (LLMs), such as ChatGPT and GPT-4, represent new opportunities across a wide spectrum of industries. More specifically, the ability of these models to follow instructions now allow for interactions with tools (also called plugins) that are able to perform tasks such as searching the web, executing code, etc. In addition, agents can be written to perform complex tasks by chaining multiple calls to one or more LLMs. For instance, in the context of computer networks, LLMs could be configured to generate command line interface (CLI) commands for networking devices, scripts or other code for execution by specific nodes in the network, configuration changes for certain endpoints or networking devices, or the like. Other tasks may relate to reporting on captured telemetry or sensor measurements, summarizing that data, or making inferences based on the data. Further tasks may even entail providing some form of control over endpoints in the network (e.g., asking a smart thermostat to adjust its setpoint temperature, etc.).
While LLMs are quite capable of performing a myriad of tasks, they are subject to a new type of attack known as a prompt injection attack. During such an attack, a malicious actor submits a prompt for input to the LLM that includes text intended to exploit the LLM in a way that does not appear to be malicious to the LLM, but causes the LLM to carry out malicious behavior (e.g., disclosing sensitive information, performing restricted actions, etc.).
The techniques herein allow for the early detection of prompt injection attacks against language models (e.g., LLMs) using semantic analysis on the text of the prompt. In various instances, the techniques herein may do so by assessing the degree of topic derailment and/or sentence incoherence of the prompt, to determine whether the prompt is likely an attack. As would be appreciated, this approach differs from deep learning-based approaches to detecting prompt injection attacks, although they could also be used in conjunction therewith.
Illustratively, the techniques described herein may be performed by hardware, software, and/or firmware, such as in accordance with prompt analysis process 248, which may include computer executable instructions executed by the processor 220 (or independent processor of interfaces 210) to perform functions relating to the techniques described herein.
Specifically, according to various implementations, a device may obtain a prompt for input to a language model. The device identifies a plurality of topics present in the prompt. The device determines that the prompt is malicious based on a variation in the plurality of topics. The device prevents the prompt from being processed by the language model.
Operationally,
During execution, prompt analysis process 248 may operate as an intermediary between a user 302 that interacts with a user interface (e.g., a keypad, a touch screen device, a microphone, etc.) and an LLM 306 or other language model. For instance, LLM 306 may be a language model that is cloud-hosted at a remote location from that of the endpoint operated by user 302. In some cases, LLM 306 may be integrated directly into prompt analysis process 248, as well (e.g., with prompt analysis process 248 providing an interface for LLM 306).
In general, LLM 306 may be a full-scale model trained to perform any number of tasks, such as tasks related to the monitoring or control of a computer network. To do so, LLM 306 may be capable of generating scripts or other code for execution, making application programming interface (API) calls, issuing command line interface (CLI) commands, providing control over an IoT actuator or sensor, summarizing or making inferences about captured telemetry data, or the like. In further implementations, though, LLM 306 may be a specialized model capable of performing only a singular task or small subset of tasks.
As would be appreciated, LLM 306 may perform its various tasks in response to prompts (e.g., conversational text) issued by user 302, such as prompt 304. In turn, LLM 306 may issue a response 308 to user 302. For instance, consider the case in which prompt 304 states “what is the reason for my Internet running slowly?” In such a case, LLM 306 may interact with a network controller and/or other networking nodes (e.g., routers, switches, etc.), to obtain information regarding the state of the network. In turn, LLM 306 may issue response 308 back to user 302 such as “there is an abnormal number of users attached to your wireless access point.”
In various implementations, prompt analysis process 248 may implement the techniques herein to detect when a prompt, such as prompt 304, represents a potential prompt injection attack. To do so, prompt analysis process 248 may rely on techniques that do not require the use of a deep learning model, although some implementations also provide for the semantic analysis introduced herein to be used in conjunction with such a model. Thus, prompt analysis process 248 determines that prompt 304 represents a potential attack, it may take corrective measures such as blocking prompt 304 from being input to LLM 306, issuing an alert to a user interface (e.g., to user 302, to an administrator, etc.), noting the reason for prompt 304 from being processed by LLM 306, etc.
In various embodiments, one factor that prompt analysis process 248 may assess during its semantic analysis of prompt 304 is the degree of topic derailment present in prompt 304, prior to it being input to LLM 306. Indeed, if the text of prompt 304 contains a high variation in topics, the techniques herein assume that it is suspicious and has a high likelihood of containing malicious content. To do so, prompt analysis process 248 may measure the adherence of a sentence within prompt 304 to a topic.
More specifically, prompt analysis process 248 may measure the degree of topic derailment within prompt 304 by quantifying words using word vectors, and then measuring the distance between the vectors such that greater distances represent larger derailment, which is assumed herein to be indicative of a prompt injection attack. In various implementations, prompt analysis process 248 may do so as follows:
Of course, the above approach is exemplary only and that multiple variations are possible, as desired, within the scope of the teachings herein. For instance, the threshold for topic derailment may be set to any desirable threshold, not just twice the standard deviation from the mean. In addition, the distance metric between two vectors could take the form of any suitable distance measurement such as their cosine distance, Canberra distance, correlation, among others. It should also be noted that the value of k above may also be selected as desired, although preliminary testing has shown that values in the range [5,8] yield good results.
Prior to prompt 402 being input to the LLM, prompt analysis process 248 may identify the topic(s) present in prompt 402 by converting each word in prompt 402 into word vectors 404. For instance, prompt analysis process 248 may leverage the FastText library, which currently includes pre-loaded mappings for 157 languages.
After converting the words of prompt 402 into word vectors 404, prompt analysis process 248 may compute the word vector distances 406 between them. For instance, prompt analysis process 248 may compute the distances between each of word vectors 404 and the mean of them.
Prompt analysis process 248 may then compute aggregate metrics 408 based on the word vector distances 406, such as the mean of means, max distance, and standard deviation of the distances. Finally, prompt analysis process 248 may determine whether prompt 402 exhibits topic derailment according to a rule 410, such as whether the derailment metric abs (mean_of_means−max_distance) is greater than twice the standard deviation distance.
Referring again to
In various implementations, prompt analysis process 248 may measure the sentence incoherence of prompt 304 by evaluating its noun-adjective and/or verb-adverb pairs. To do so, prompt analysis process 248 may compare each pair to that of a base control set of pairs and measure the distance of the sentence modifiers from the control modifiers. For instance, prompt analysis process 248 may leverage a corpus of noun-adjective or verb/adverb pairs such as “furry cat,” “wild dog,” “running quickly,” etc. Thus, if prompt 304 includes a pair such as “triangular cat” or “watery dog,” which would not be expected according to the corpus, this may indicate sentence incoherence and a potential prompt injection attack.
In some instances, prompt analysis process 248 may assess the sentence incoherence of prompt 304 as follows:
In this case, the higher the similarity the lower the incoherence.
Here, the suspicion incoherence threshold may be set such that values below the threshold signify prompt injection attack candidates.
At step 515, as detailed above, the device may identify a plurality of topics present in the prompt. For instance, the device may assess the words present in the prompt and convert them into vector representations, such as by using a FastText library.
At step 520, the device may determine that the prompt is malicious based on a variation in the plurality of topics, as described in greater detail above. In various implementations, the device may do so by converting words present in the prompt into vector representations and computing distance metrics between the vector representations. In some implementations, the device may also determine a measure of sentence incoherence of the prompt by evaluating any noun-adjective or verb-adverb word pairs in the prompt, in which case the device may determine that the prompt is malicious based in part on the measure of sentence incoherence. In one implementation, the device may compute the measure of sentence incoherence in part by comparing any noun-adjective or verb-adverb word pairs in the prompt to noun-adjective or verb-adverb word pairs in a baseline corpus. In some cases, the device may determine that the prompt is malicious without analyzing the prompt using a machine learning-based model.
At step 525, as detailed above, the device may prevent the prompt from being processed by the language model. In some implementations, the device may also provide, to a user interface, an indication that the prompt is a suspected prompt injection attack. In another implementation, the device may provide an indication that the prompt was prevented from being processed by the language model because of the measure of sentence incoherence.
Procedure 500 then ends at step 530.
It should be noted that while certain steps within procedure 500 may be optional as described above, the steps shown in
While there have been shown and described illustrative implementations that provide for the early detection of prompt injection attacks using semantic analysis, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the implementations herein. For example, while certain implementations are described herein with respect to using certain models for purposes of performing tasks such as generating CLI commands, making API calls, charting a network, and the like, the models are not limited as such and may be used for other types of tasks, in other implementations. In addition, while certain protocols are shown, other suitable protocols may be used, accordingly.
The foregoing description has been directed to specific implementations. It will be apparent, however, that other variations and modifications may be made to the described implementations, with the attainment of some or all of their advantages. For instance, it is expressly contemplated that the components and/or elements described herein can be implemented as software being stored on a tangible (non-transitory) computer-readable medium (e.g., disks/CDs/RAM/EEPROM/etc.) having program instructions executing on a computer, hardware, firmware, or a combination thereof. Accordingly, this description is to be taken only by way of example and not to otherwise limit the scope of the implementations herein. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the implementations herein.