This disclosure relates generally to time series data and, in some non-limiting embodiments or aspects, to methods, systems, and computer program products for efficient content-based time series retrieval.
A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, manufacturing, and/or the like. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because CTSR systems may work with time series data from diverse domains, CTSR systems may use a high-capacity model to effectively measure the similarity between different time series. Further, users may require the model within the CTSR system to compute the similarity scores in an efficient manner as the users interact with the system in real-time.
Accordingly, provided are improved methods, systems, and computer program products for content-based time series retrieval.
According to some non-limiting embodiments or aspects, provided is a method, including: obtaining, with at least one processor, from at least one database, a plurality of known time series; for each known time series of the plurality of known time series: computing, with the at least one processor, a pairwise distance matrix between that known time series and each learned template of a plurality of learned templates to generate a plurality of pairwise distance matrices; stacking, with the at least one processor, the plurality of pairwise distance matrices together to generate a tensor; and processing, with the at least one processor, with a residual network, the tensor, wherein the residual network receives, as input, the tensor, and provides, as output, a feature vector for that known time series; and providing, with the at least one processor, the feature vector for each known time series of the plurality of known time series.
In some non-limiting embodiments or aspects, the method further includes: storing, with the at least one processor, in the at least one database, the feature vector for each known time series of the plurality of known time series.
In some non-limiting embodiments or aspects, the method further includes: obtaining, with the at least one processor, an unknown time series; computing, with the at least one processor, a pairwise distance matrix between the unknown time series and each learned template of the plurality of learned templates to generate a further plurality of pairwise distance matrices; stacking, with the at least one processor, the further plurality of pairwise distance matrices together to generate a further tensor; processing, with the at least one processor, with the residual network, the further tensor, wherein the residual network receives, as input, the further tensor, and provides, as output, a feature vector for the unknown time series; for each known time series of the plurality of known time series stored in the database, determining, with the at least one processor, based on the stored feature vector for that known time series and the feature vector for the unknown time series, a distance between that known time series and the unknown time series; and providing, with the at least one processor, based on the distance between each known time series and the unknown time series, at least one known time series determined to be similar to the unknown time series.
In some non-limiting embodiments or aspects, the residual network is trained using a loss function defined according to the following Equation:
where is a batch of training data
=[
. . .
], m is a batch size, each sample in the batch is a tuple
including a query time series ti, a positive time series ti+, and a negative time series ti−, σ(·) is a sigmoid function, and ƒθ(·, ·) is the residual network.
In some non-limiting embodiments or aspects, the plurality of known time series includes a plurality of known transaction time series associated with a plurality of merchants, and wherein each known time series is associated with metadata associated with a merchant associated with that known time series.
In some non-limiting embodiments or aspects, the plurality of learned templates includes thirty-two learned templates, wherein the plurality of pairwise distance matrices includes thirty-two pairwise distance matrices, wherein the tensor includes an input dimension of thirty-two, and wherein the feature vector for each known time series of the plurality of known time series includes a size sixty-four vector.
In some non-limiting embodiments or aspects, the residual network includes a two-dimensional residual network.
According to some non-limiting embodiments or aspects, provided is a system, including: at least one processor coupled to a memory and configured to: obtain, from at least one database, a plurality of known time series; for each known time series of the plurality of known time series: compute a pairwise distance matrix between that known time series and each learned template of a plurality of learned templates to generate a plurality of pairwise distance matrices; stack the plurality of pairwise distance matrices together to generate a tensor; and process, with the residual network, the tensor, wherein the residual network receives, as input, the tensor, and provides, as output, a feature vector for that known time series; and provide the feature vector for each known time series of the plurality of known time series.
In some non-limiting embodiments or aspects, the at least one processor is further configured to: store, in the at least one database, the feature vector for each known time series of the plurality of known time series.
In some non-limiting embodiments or aspects, the at least one processor is further configured to: obtain an unknown time series; compute a pairwise distance matrix between the unknown time series and each learned template of the plurality of learned templates to generate a further plurality of pairwise distance matrices; stack the further plurality of pairwise distance matrices together to generate a further tensor; process, with the residual network, the further tensor, wherein the residual network receives, as input, the further tensor, and provides, as output, a feature vector for the unknown time series; for each known time series of the plurality of known time series stored in the database, determine, based on the stored feature vector for that known time series and the feature vector for the unknown time series, a distance between that known time series and the unknown time series; and provide, based on the distance between each known time series and the unknown time series, at least one known time series determined to be similar to the unknown time series.
In some non-limiting embodiments or aspects, the residual network is trained using a loss function defined according to the following Equation:
where is a batch of training data
=[
. . .
], m is a batch size, each sample in the batch is a tuple
including a query time series ti, a positive time series ti+, and a negative time series ti−, σ(·) is a sigmoid function, and ƒθ(·, ·) is the residual network.
In some non-limiting embodiments or aspects, the plurality of known time series includes a plurality of known transaction time series associated with a plurality of merchants, and wherein each known time series is associated with metadata associated with a merchant associated with that known time series.
In some non-limiting embodiments or aspects, the plurality of learned templates includes thirty-two learned templates, wherein the plurality of pairwise distance matrices includes thirty-two pairwise distance matrices, wherein the tensor includes an input dimension of thirty-two, and wherein the feature vector for each known time series of the plurality of known time series includes a size sixty-four vector.
In some non-limiting embodiments or aspects, the residual network includes a two-dimensional residual network.
According to some non-limiting embodiment or aspects, provided is a computer program product including a non-transitory computer readable medium including program instructions which, when executed by at least one processor, cause the at least one processor to: obtain, from at least one database, a plurality of known time series; for each known time series of the plurality of known time series: compute a pairwise distance matrix between that known time series and each learned template of a plurality of learned templates to generate a plurality of pairwise distance matrices; stack the plurality of pairwise distance matrices together to generate a tensor; and process, with the residual network, the tensor, wherein the residual network receives, as input, the tensor, and provides, as output, a feature vector for that known time series; and provide the feature vector for each known time series of the plurality of known time series.
In some non-limiting embodiments or aspects, the program instructions, when executed by the at least one processor, further cause the at least one processor to: store, in the at least one database, the feature vector for each known time series of the plurality of known time series.
In some non-limiting embodiments or aspects, the program instructions, when executed by the at least one processor, further cause the at least one processor to: obtain an unknown time series; compute a pairwise distance matrix between the unknown time series and each learned template of the plurality of learned templates to generate a further plurality of pairwise distance matrices; stack the further plurality of pairwise distance matrices together to generate a further tensor; process, with the residual network, the further tensor, wherein the residual network receives, as input, the further tensor, and provides, as output, a feature vector for the unknown time series; for each known time series of the plurality of known time series stored in the database, determine, based on the stored feature vector for that known time series and the feature vector for the unknown time series, a distance between that known time series and the unknown time series; and provide, based on the distance between each known time series and the unknown time series, at least one known time series determined to be similar to the unknown time series.
In some non-limiting embodiments or aspects, the residual network is trained using a loss function defined according to the following Equation:
where is a batch of training data
=[
. . .
], m is a batch size, each sample in the batch is a tuple
including a query time series ti, a positive time series ti+, and a negative time series ti−, σ(·) is a sigmoid function, and ƒθ(·, ·) is the residual network.
In some non-limiting embodiments or aspects, the plurality of known time series includes a plurality of known transaction time series associated with a plurality of merchants, and wherein each known time series is associated with metadata associated with a merchant associated with that known time series.
In some non-limiting embodiments or aspects, the plurality of learned templates includes thirty-two learned templates, wherein the plurality of pairwise distance matrices includes thirty-two pairwise distance matrices, wherein the tensor includes an input dimension of thirty-two, and wherein the feature vector for each known time series of the plurality of known time series includes a size sixty-four vector, and wherein the residual network includes a two-dimensional residual network.
Further non-limiting embodiments or aspects are set forth in the following numbered clauses:
Clause 1: A method, comprising: obtaining, with at least one processor, from at least one database, a plurality of known time series; for each known time series of the plurality of known time series: computing, with the at least one processor, a pairwise distance matrix between that known time series and each learned template of a plurality of learned templates to generate a plurality of pairwise distance matrices; stacking, with the at least one processor, the plurality of pairwise distance matrices together to generate a tensor; and processing, with the at least one processor, with a residual network, the tensor, wherein the residual network receives, as input, the tensor, and provides, as output, a feature vector for that known time series; and providing, with the at least one processor, the feature vector for each known time series of the plurality of known time series.
Clause 2: The method of clause 1, further comprising: storing, with the at least one processor, in the at least one database, the feature vector for each known time series of the plurality of known time series.
Clause 3: The method of clause 1 or 2, further comprising: obtaining, with the at least on processor, an unknown time series; computing, with the at least one processor, a pairwise distance matrix between the unknown time series and each learned template of the plurality of learned templates to generate a further plurality of pairwise distance matrices; stacking, with the at least one processor, the further plurality of pairwise distance matrices together to generate a further tensor; processing, with the at least one processor, with the residual network, the further tensor, wherein the residual network receives, as input, the further tensor, and provides, as output, a feature vector for the unknown time series; for each known time series of the plurality of known time series stored in the database, determining, with the at least one processor, based on the stored feature vector for that known time series and the feature vector for the unknown time series, a distance between that known time series and the unknown time series; and providing, with the at least one processor, based on the distance between each known time series and the unknown time series, at least one known time series determined to be similar to the unknown time series.
Clause 4: The method of any of clauses 1-3, wherein the residual network is trained using a loss function defined according to the following Equation:
where is a batch of training data
=[
. . .
], m is a batch size, each sample in the batch is a tuple
including a query time series ti, a positive time series ti+, and a negative time series ti−, σ(·) is a sigmoid function, and ƒθ(·, ·) is the residual network.
Clause 5: The method of any of clauses 1-4, wherein the plurality of known time series includes a plurality of known transaction time series associated with a plurality of merchants, and wherein each known time series is associated with metadata associated with a merchant associated with that known time series.
Clause 6: The method of any of clauses 1-5, wherein the plurality of learned templates includes thirty-two learned templates, wherein the plurality of pairwise distance matrices includes thirty-two pairwise distance matrices, wherein the tensor includes an input dimension of thirty-two, and wherein the feature vector for each known time series of the plurality of known time series includes a size sixty-four vector.
Clause 7: The method of any of clauses 1-6, wherein the residual network includes a two-dimensional residual network.
Clause 8: A system, comprising: at least one processor coupled to a memory and configured to: obtain, from at least one database, a plurality of known time series; for each known time series of the plurality of known time series: compute a pairwise distance matrix between that known time series and each learned template of a plurality of learned templates to generate a plurality of pairwise distance matrices; stack the plurality of pairwise distance matrices together to generate a tensor; and process, with the residual network, the tensor, wherein the residual network receives, as input, the tensor, and provides, as output, a feature vector for that known time series; and provide the feature vector for each known time series of the plurality of known time series.
Clause 9: The system of clause 8, wherein the at least one processor is further configured to: store, in the at least one database, the feature vector for each known time series of the plurality of known time series.
Clause 10: The system of clause 8 or 9, wherein the at least one processor is further configured to: obtain an unknown time series; compute a pairwise distance matrix between the unknown time series and each learned template of the plurality of learned templates to generate a further plurality of pairwise distance matrices; stack the further plurality of pairwise distance matrices together to generate a further tensor; process, with the residual network, the further tensor, wherein the residual network receives, as input, the further tensor, and provides, as output, a feature vector for the unknown time series; for each known time series of the plurality of known time series stored in the database, determine, based on the stored feature vector for that known time series and the feature vector for the unknown time series, a distance between that known time series and the unknown time series; and provide, based on the distance between each known time series and the unknown time series, at least one known time series determined to be similar to the unknown time series.
Clause 11: The system of any of clauses 8-10, wherein the residual network is trained using a loss function defined according to the following Equation:
where is a batch of training data
=[
. . .
], m is a batch size, each sample in the batch is a tuple
including a query time series ti, a positive time series ti+, and a negative time series ti−, σ(·) is a sigmoid function, and ƒθ(·, ·) is the residual network.
Clause 12: The system of any of clauses 8-11, wherein the plurality of known time series includes a plurality of known transaction time series associated with a plurality of merchants, and wherein each known time series is associated with metadata associated with a merchant associated with that known time series.
Clause 13: The system of any of clauses 8-12, wherein the plurality of learned templates includes thirty-two learned templates, wherein the plurality of pairwise distance matrices includes thirty-two pairwise distance matrices, wherein the tensor includes an input dimension of thirty-two, and wherein the feature vector for each known time series of the plurality of known time series includes a size sixty-four vector.
Clause 14: The system of any of clauses 8-13, wherein the residual network includes a two-dimensional residual network.
Clause 15: A computer program product including a non-transitory computer readable medium including program instructions which, when executed by at least one processor, cause the at least one processor to: obtain, from at least one database, a plurality of known time series; for each known time series of the plurality of known time series: compute a pairwise distance matrix between that known time series and each learned template of a plurality of learned templates to generate a plurality of pairwise distance matrices; stack the plurality of pairwise distance matrices together to generate a tensor; and process, with the residual network, the tensor, wherein the residual network receives, as input, the tensor, and provides, as output, a feature vector for that known time series; and provide the feature vector for each known time series of the plurality of known time series.
Clause 16: The computer program product of clause 15, wherein the program instructions, when executed by the at least one processor, further cause the at least one processor to: store, in the at least one database, the feature vector for each known time series of the plurality of known time series.
Clause 17: The computer program product of clause 15 or 16, wherein the program instructions, when executed by the at least one processor, further cause the at least one processor to: obtain an unknown time series; compute a pairwise distance matrix between the unknown time series and each learned template of the plurality of learned templates to generate a further plurality of pairwise distance matrices; stack the further plurality of pairwise distance matrices together to generate a further tensor; process, with the residual network, the further tensor, wherein the residual network receives, as input, the further tensor, and provides, as output, a feature vector for the unknown time series; for each known time series of the plurality of known time series stored in the database, determine, based on the stored feature vector for that known time series and the feature vector for the unknown time series, a distance between that known time series and the unknown time series; and provide, based on the distance between each known time series and the unknown time series, at least one known time series determined to be similar to the unknown time series.
Clause 18: The computer program product of any of clauses 15-17, wherein the residual network is trained using a loss function defined according to the following Equation:
where is a batch of training data
=[
. . .
], m is a batch size, each sample in the batch is a tuple
including a query time series ti, a positive time series ti+, and a negative time series ti−, σ(·) is a sigmoid function, and ƒθ(·, ·) is the residual network.
Clause 19: The computer program product of any of clauses 15-18, wherein the plurality of known time series includes a plurality of known transaction time series associated with a plurality of merchants, and wherein each known time series is associated with metadata associated with a merchant associated with that known time series.
Clause 20: The computer program product of any of clauses 15-19, wherein the plurality of learned templates includes thirty-two learned templates, wherein the plurality of pairwise distance matrices includes thirty-two pairwise distance matrices, wherein the tensor includes an input dimension of thirty-two, and wherein the feature vector for each known time series of the plurality of known time series includes a size sixty-four vector, and wherein the residual network includes a two-dimensional residual network.
These and other features and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structures and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosed subject matter.
Additional advantages and details are explained in greater detail below with reference to the non-limiting, exemplary embodiments that are illustrated in the accompanying schematic figures, in which:
For purposes of the description hereinafter, the terms “end,” “upper,” “lower,” “right,” “left,” “vertical,” “horizontal,” “top,” “bottom,” “lateral,” “longitudinal,” and derivatives thereof shall relate to the embodiments as they are oriented in the drawing figures. However, it is to be understood that the present disclosure may assume various alternative variations and step sequences, except where expressly specified to the contrary. It is also to be understood that the specific devices and processes illustrated in the attached drawings, and described in the following specification, are simply exemplary and non-limiting embodiments or aspects of the disclosed subject matter. Hence, specific dimensions and other physical characteristics related to the embodiments or aspects disclosed herein are not to be considered as limiting.
Some non-limiting embodiments or aspects are described herein in connection with thresholds. As used herein, satisfying a threshold may refer to a value being greater than the threshold, more than the threshold, higher than the threshold, greater than or equal to the threshold, less than the threshold, fewer than the threshold, lower than the threshold, less than or equal to the threshold, equal to the threshold, etc.
No aspect, component, element, structure, act, step, function, instruction, and/or the like used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items and may be used interchangeably with “one or more” and “at least one.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, and/or the like) and may be used interchangeably with “one or more” or “at least one.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based at least partially on” unless explicitly stated otherwise. In addition, reference to an action being “based on” a condition may refer to the action being “in response to” the condition. For example, the phrases “based on” and “in response to” may, in some non-limiting embodiments or aspects, refer to a condition for automatically triggering an action (e.g., a specific operation of an electronic device, such as a computing device, a processor, and/or the like).
As used herein, the term “communication” may refer to the reception, receipt, transmission, transfer, provision, and/or the like of data (e.g., information, signals, messages, instructions, commands, and/or the like). For one unit (e.g., a device, a system, a component of a device or system, combinations thereof, and/or the like) to be in communication with another unit means that the one unit is able to directly or indirectly receive information from and/or transmit information to the other unit. This may refer to a direct or indirect connection (e.g., a direct communication connection, an indirect communication connection, and/or the like) that is wired and/or wireless in nature. Additionally, two units may be in communication with each other even though the information transmitted may be modified, processed, relayed, and/or routed between the first and second unit. For example, a first unit may be in communication with a second unit even though the first unit passively receives information and does not actively transmit information to the second unit. As another example, a first unit may be in communication with a second unit if at least one intermediary unit processes information received from the first unit and communicates the processed information to the second unit. In some non-limiting embodiments or aspects, a message may refer to a network packet (e.g., a data packet and/or the like) that includes data. It will be appreciated that numerous other arrangements are possible.
As used herein, the term “computing device” may refer to one or more electronic devices configured to process data. A computing device may, in some examples, include the necessary components to receive, process, and output data, such as a processor, a display, a memory, an input device, a network interface, and/or the like. A computing device may be a mobile device. As an example, a mobile device may include a cellular phone (e.g., a smartphone or standard cellular phone), a portable computer, a wearable device (e.g., watches, glasses, lenses, clothing, and/or the like), a personal digital assistant (PDA), and/or other like devices. A computing device may also be a desktop computer or other form of non-mobile computer.
As used herein, the term “server” may refer to or include one or more computing devices that are operated by or facilitate communication and processing for multiple parties in a network environment, such as the Internet, although it will be appreciated that communication may be facilitated over one or more public or private network environments and that various other arrangements are possible. Further, multiple computing devices (e.g., servers, point-of-sale (POS) devices, mobile devices, etc.) directly or indirectly communicating in the network environment may constitute a “system.”
As used herein, the term “system” may refer to one or more computing devices or combinations of computing devices (e.g., processors, servers, client devices, software applications, components of such, and/or the like). Reference to “a device,” “a server,” “a processor,” and/or the like, as used herein, may refer to a previously-recited device, server, or processor that is recited as performing a previous step or function, a different device, server, or processor, and/or a combination of devices, servers, and/or processors. For example, as used in the specification and the claims, a first device, a first server, or a first processor that is recited as performing a first step or a first function may refer to the same or different device, server, or processor recited as performing a second step or a second function.
As used herein, the term “real-time” refers to performance of a task or tasks during another process or before another process is completed. For example, a real-time inference may be an inference that is obtained from a model before a payment transaction is authorized, completed, and/or the like.
Time series is a common data type analyzed for a variety of applications. For example, time series from different sensors on manufacturing machines may be examined by engineers for identifying ways to improve factories' efficiency, various biometric time series may be studied by doctors for medical research, and multiple streams of time series from operating payment networks may be monitored for unusual activities. As a large volume of time series data are becoming available from various sources, an effective Content-based Time Series Retrieval (CTSR) system is needed to help users browse time series databases.
In the aforementioned example illustrated in
Design goals when building a CTSR system may include: 1) to effectively capture various concepts in time series from different domains, and 2) to be efficient during inference, given the real-time interactions of users with the system. A reason for a difference in inference time between CTSR systems may be the difference in the role of the neural network model.
Non-limiting embodiments or aspects of the present disclosure provide methods, systems, and computer program products for content-based time series retrieval that obtain, from at least one database, a plurality of known time series; for each known time series of the plurality of known time series: compute a pairwise distance matrix between that known time series and each learned template of a plurality of learned templates to generate a plurality of pairwise distance matrices; stack the plurality of pairwise distance matrices together to generate a tensor; process, with the residual network, the tensor, wherein the residual network receives, as input, the tensor, and provides, as output, a feature vector for that known time series; and provide the feature vector for each known time series of the plurality of known time series. Non-limiting embodiments or aspects of the present disclosure thus provide methods, systems, and computer program products for content-based time series retrieval enabled to obtain an unknown time series; compute a pairwise distance matrix between the unknown time series and each learned template of the plurality of learned templates to generate a further plurality of pairwise distance matrices; stack the further plurality of pairwise distance matrices together to generate a further tensor; process, with the residual network, the further tensor, wherein the residual network receives, as input, the further tensor, and provides, as output, a feature vector for the unknown time series; for each known time series of the plurality of known time series stored in the database, determine, based on the stored feature vector for that known time series and the feature vector for the unknown time series, a Euclidean distance between that known time series and the unknown time series; and identify, based on the Euclidean distance between each known time series and the unknown time series, at least one known time series determined to correspond to the unknown time series
In this way, non-limiting embodiments or aspects of the present disclosure may provide an improved model architecture based on the RND2D model with improved efficiency, which may be referred to herein as Residual Network 2D with Template Learning (RN2Dw/T). As illustrated in example (c) of
Referring now to
In some non-limiting embodiments or aspects, transaction processing system 101 may communicate with merchant system 104 directly through a public or private network connection. Additionally or alternatively, transaction processing system 101 may communicate with merchant system 104 through payment gateway 102 and/or acquirer system 108. In some non-limiting embodiments or aspects, an acquirer system 108 associated with merchant system 104 may operate as payment gateway 102 to facilitate the communication of transaction requests from merchant system 104 to transaction processing system 101. Merchant system 104 may communicate with payment gateway 102 through a public or private network connection. For example, a merchant system 104 that includes a physical POS device may communicate with payment gateway 102 through a public or private network to conduct card-present transactions. As another example, a merchant system 104 that includes a server (e.g., a web server) may communicate with payment gateway 102 through a public or private network, such as a public Internet connection, to conduct card-not-present transactions.
In some non-limiting embodiments or aspects, transaction processing system 101, after receiving a transaction request from merchant system 104 that identifies an account identifier of a payor (e.g., such as an account holder) associated with an issued consumer device 110, may generate an authorization request message to be communicated to the issuer system 106 that issued the consumer device 110 and/or account identifier. Issuer system 106 may then approve or decline the authorization request and, based on the approval or denial, generate an authorization response message that is communicated to transaction processing system 101. Transaction processing system 101 may communicate an approval or denial to merchant system 104. When issuer system 106 approves the authorization request message, it may then clear and settle the payment transaction between the issuer system 106 and acquirer system 108.
The number and arrangement of systems and devices shown in
Referring now to
As shown in
With continued reference to
Device 200 may perform one or more processes described herein. Device 200 may perform these processes based on processor 204 executing software instructions stored by a computer-readable medium, such as memory 206 and/or storage component 208. A computer-readable medium may include any non-transitory memory device. A memory device includes memory space located inside of a single physical storage device or memory space spread across multiple physical storage devices. Software instructions may be read into memory 206 and/or storage component 208 from another computer-readable medium or from another device via communication interface 214. When executed, software instructions stored in memory 206 and/or storage component 208 may cause processor 204 to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software. The term “configured to,” as used herein, may refer to an arrangement of software, device(s), and/or hardware for performing and/or enabling one or more functions (e.g., actions, processes, steps of a process, and/or the like). For example, “a processor configured to” may refer to a processor that executes software instructions (e.g., program code) that cause the processor to perform one or more functions.
The following conventions may be used herein for notations: lowercase letters (e.g., x) may denote scalars, boldface lowercase letters (e.g., x) may denote vectors, uppercase letters (e.g., X) may denote matrices, boldface uppercase letters (e.g., X) may denote tensors, and calligraphic letters (e.g., ) may denote sets.
A Content-based Time Series Retrieval (CSTR) problem may be formulated as follows: Given a set of time series =[x1, . . . , xn] and any query time series q, obtain a relevance score function ƒ(·, ·) that satisfies the property that ƒ(xi, q)>ƒ(xj, q) if xi is more relevant to q than xj. The scoring function can be either a predefined similarity/distance function or a trainable function that is optimized using the metadata associated with each time series in
.
The time series retrieval problem may be formulated in two ways. The first is also known as the time series similarity search problem, where the goal is to find the top k time series that are most similar to a given query based on a fixed distance function. Because the distance function is fixed, the focus of this type of research is on efficiency, with speed up achieved through techniques such as lower bounding, early abandoning, and/or indexing. If this problem is compared with the above problem statement, it can be seen that a goal of techniques for addressing the time series similarity search problem is different from that for addressing the above problem statement.
The second type of problem formulation is more aligned with that for addressing the above problem statement, wherein an objective is to develop a model or scoring function to aid users in retrieving relevant time series from a database based on the query time series submitted. However, existing models for addressing this second type of problem formulation are designed to address multivariate time series, which if applied to the above problem statement, would simply reduce to a standard long short-term memory network.
Euclidean distance and dynamic time warping distance are popular and straightforward tools for analyzing time series data. They are widely used in various tasks such as similarity search, classification, and anomaly detection, and both distance functions may be readily applied to the above problem. Another family of methods that can be applied to the above problem is neural networks, especially those capable of modeling sequential data. For example, long short-term memory networks, gated recurrent unit networks, transformers, and convolutional neural networks have shown effectiveness in tasks such as time series classification, forecasting, and anomaly detection.
Six existing baseline methods are now presented. Following that, the previously noted RN2D method is introduced and benefits of the RN2D method are contrasted with that of the other baseline methods. After introducing the RN2D method, further details regarding a Residual Network 2D with Template learning (RN2Dw/T) method according to non-limiting embodiments or aspects are provided, which solves an efficiency issue associated with the design of RN2D.
The six existing baseline methods considered include Euclidean Distance (ED), Dynamic Time Warping (DTW), Long Short-Term Memory network (LSTM), Gated Recurrent Unit network (GRU), Transformer (TF), and Residual Network 1D (RN1 D).
The Euclidean distance may be computed between the query time series and the time series in the collection. The collection may then be sorted based on the distances. This may be the simplest approach for solving the CTSR problem.
DTW is similar to the ED baseline, but uses the DTW distance instead. The DTW distance is considered as a simple yet effective baseline for time series classification problems.
The LSTM is one of the most popular Recurrent Neural Networks (RNNs) used for modeling sequential data. LSTM models may be optimized using the Siamese network architecture (see e.g., example (a) of
The GRU is another popular RNN architecture widely used for modeling sequential data. To optimize the GRU model, a similar approach as for the LSTM model may be applied, wherein the LSTM cells in the RNN architecture are replaced with GRU cells.
The TF is an alternative to the RNNs for sequence modeling. To learn the hidden representation for the input time series, the transformer encoder proposed by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin in the 2017 paper entitled “Attention is all you need. Advances in neural information processing systems” may be used. The RNNs used in the previous two methods (i.e., LSTM and GRU) may be replaced with transformer encoders, resulting in a transformer-based Siamese network architecture instead of an RNN-based one.
The RN1 D is a time series classification model inspired by the success of residual networks in computer vision. The RN1 D employs 1D convolutional layers instead of 2D convolutional layers. Extensive evaluations have demonstrated that the RN1 D design is among the strongest models for time series classification. The RN1 D model may also be optimized in a Siamese network (see e.g., example (a) of
Each of the ED and DTW methods require no training phase as there are no parameters to optimize for either method. The DTW method is the more effective method of the two for time series data, because the DTW method considers all alignments between the input time series. The computation of DTW distance can be abstracted into a two-stage process as shown in is computed from the input time series a=[a1, . . . , aw](where w is the length of a) and b=[b1, . . . , bh](where h is the length of b) as D[i, j]=|ai−bj|. In the second stage (lines 6 to 8), a fixed recursion function is applied to D (i.e., D[i, j]←D[i, j]+min(D[i−1, j], D[i, j−1], D[i−1, j−1]) for each element in D. Consequently, the DTW method can be viewed as running a predefined function on the pair-wise distance matrix between the input time series.
The remaining four baseline methods use the Siamese network distance learning framework (see e.g., example (a) in
Referring now to
The design of RN2D is motivated by the deep residual networks used in computer vision. As shown in in may first be projected to
neck space using a 1×1 convolutional layer. Subsequently, the tensor is passed through a ReLU layer before transforming it further to Rw/2×h/2×nneck space using a 3×3 convolutional layer with stride two. After another ReLU layer, the intermediate representation may be projected to
out space with a 1×1 convolutional layer, with the output of the 1×1 convolutional layer referred to as Xout. As the sizes of Xin and Xout do not match, Xin with Xout may not be directly added for the skip connection, and Xin may be processed with a 1×1 convolutional layer before adding it to Xout. After the addition, the merged representation may be processed with an ReLU and exits the building block. The output will be in
out space given the input is in
in space.
Still referring to may be computed similar to the DTW method. The ith and jth position of D may be computed with [i, j]=|ai−bj|. Before applying the convolutional layer, the shape of D may be converted to w×h×1 by adding an extra dimension. Next, a 7×7 convolutional layer with step size of two may be used to project D to Rw/2×h/2×64 space. After an ReLU layer, the intermediate representation may pass through eight building blocks with the 64→16→64 setting. A global average pooling layer may then be applied to reduce the spatial dimension, and the output of the global average pooling layer may include a size sixty-four vector. Finally, a linear layer may project the vector to a scalar number, which may include a relevance score between the two input time series. In some non-limiting embodiments or aspects, the plurality of learned templates includes thirty-two learned templates, wherein the plurality of pairwise distance matrices includes thirty-two pairwise distance matrices, wherein the tensor includes an input dimension of thirty-two, and wherein the feature vector for each known time series of the plurality of known time series includes a size sixty-four vector.
As shown in example (b) of
Referring now to
As shown in
Here, the first two differences between the models may exist because the RN2Dw/T model aims to extract the feature vector of the input time series, while the RN2D model computes the relevant score between the two input time series.
The third difference is in the pairwise distance matrix computation step, which is also a reason why an RN2Dw/T model according to some non-limiting embodiments or aspects is much faster than the RN2D model. The pairwise distance matrices may be computed as follows: given an input time series a=[a1, . . . , aw] and the kth template tk=[tk, 1, . . . , tk, w], the kth pairwise distance matrix Dk∈ may be computed with Dk [i, j]=|ai−tk, j|. The pairwise distance matrix for each of the plurality of templates (e.g., 32 templates, etc.) may be computed, resulting in a plurality of w×h matrices (e.g., 32 w×h matrices, etc.). The plurality of templates (e.g., the 32 templates, etc.) may be learned during the training phase and may include reference time series that help the model project the input time series to Euclidean space using the 2D convolutional design. Then, the plurality of w×h matrices (e.g., the 32 w×h matrices, etc.) may be stacked together to form a w×h×32 tensor for the first 2D convolutional layer. The w×h×32 tensor may be the output of the pairwise distance matrix computation step for the RN2Dw/T model.
The fourth difference between the two models may be to accommodate the fact that the input tensor to the first convolutional layer for the RN2Dw/T model may be w×h×32, while the input tensor for the first convolutional layer in the RN2D model is w×h×1.
As shown in example (c) of
In some non-limiting embodiments or aspects, a Bayesian personalized ranking loss as described by Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme in the 2009 paper entitled “BPR: Bayesian personalized ranking from implicit feedback” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence at pages 452-461, the entire disclosure of which is hereby incorporated by reference in its entirety, may be used to train or optimize the RN2Dw/T model. A Bayesian personalized ranking loss is appropriate for a CTSR problem because it is a “Learning to Rank” problem. Given a batch of training data =[
, . . . ,
], the loss function may be defined according to the following Equation (1):
where is the batch of training data
=[
, . . . ,
], m is a batch size, each sample in the batch is a tuple
including a query (or anchor) time series ti, a positive time series ti+, and a negative time series ti−, σ(·) is a sigmoid function, and ƒθ(·, ·) is the residual network or model. In some non-limiting embodiments or aspects, the AdamW optimizer as described by Ilya Loshchilov and Frank Hutter in the 2018 paper entitled “DecoupledWeight Decay Regularization” In International Conference on Learning Representations, the entire disclosure of which is hereby incorporated by reference in its entirety, may be used to train the RN2Dw/T using the Bayesian personalized ranking loss.
Referring now to
As shown in
In some non-limiting embodiments or aspects, the plurality of known time series includes a plurality of known transaction time series associated with a plurality of merchants, and wherein each known time series is associated with metadata associated with a merchant associated with that known time series. For example, a known time series may include a time series signature representative of a use of electronic payment processing network 100 by merchant system 104. As an example, a known (or unknown) time series may include transaction data associated with a plurality of transactions and/or a plurality of time points. As an example, a payment transaction may include transaction parameters and/or features associated with the payment transaction. Transacting parameters and/or features (e.g., categorical features, numerical features, local features, graph features or embeddings, etc.) associated with a payment transaction may include transaction parameters of the transaction, features determined based thereon (e.g., using feature engineering, etc.), and/or the like, such as an account identifier (e.g., a PAN, etc.), a transaction amount, a transaction date and/or time, a type of products and/or services associated with the transaction, a conversion rate of currency, a type of currency, a merchant type, a merchant name, a merchant location, and/or the like. However, non-limiting embodiments or aspects are not limited thereto, and transaction parameters and/or features of a transaction may include any data including any type of parameters associated with any type of transaction.
As shown in may be computed with Dk [i, j]=|ai−tk, j|. The pairwise distance matrix for each of the plurality of templates (e.g., 32 templates, etc.) may be computed, resulting in a plurality of w×h matrices (e.g., 32 w×h matrices, etc.). The plurality of templates (e.g., the 32 templates, etc.) may be learned during the training phase and may include reference time series that help the model project the input time series to Euclidean space using the 2D convolutional design.
As shown in
As shown in space (e.g., to project D to
space, etc.) and a rectified linear unit (ReLU) layer, After the ReLU layer, an intermediate representation may pass through a plurality of building blocks (e.g., eight building blocks with the 64→16→64 setting, etc.), and a global average pooling layer may be applied to reduce the spatial dimension. The output of the global average pooling layer may be muti-dimensional (e.g., a size-sixty-four vector, etc.), and a last linear layer of the residual network may output vectors instead of scalars as in an RN2D model.
As shown in
As shown in
In some non-limiting embodiments or aspects, an unknown time series includes an unknown transaction time series associated with a merchant and/or including metadata associated with a merchant. For example, an unknown time series may include a time series signature representative of a use of electronic payment processing network 100 by merchant system 104. As an example, an unknown (or known) time series may include transaction data associated with a plurality of transactions and/or a plurality of time points. As an example, a payment transaction may include transaction parameters and/or features associated with the payment transaction. Transacting parameters and/or features (e.g., categorical features, numerical features, local features, graph features or embeddings, etc.) associated with a payment transaction may include may include transaction parameters of the transaction, features determined based thereon (e.g., using feature engineering, etc.), and/or the like, such as an account identifier (e.g., a PAN, etc.), a transaction amount, a transaction date and/or time, a type of products and/or services associated with the transaction, a conversion rate of currency, a type of currency, a merchant type, a merchant name, a merchant location, and/or the like. However, non-limiting embodiments or aspects are not limited thereto, and transaction parameters and/or features of a transaction may include any data including any type of parameters associated with any type of transaction.
As shown in may be computed with Dk [i, j]=|ai−tk, j|. The further pairwise distance matrix for each of the plurality of templates (e.g., 32 templates, etc.) may be computed, resulting in a further plurality of w×h matrices (e.g., 32 w×h matrices, etc.).
As shown in
As shown in space (e.g., to project D to
space, etc.) and the ReLU layer. After
the ReLU layer, an intermediate representation may pass through the plurality of building blocks (e.g., eight building blocks with the 64→16→64 setting, etc.), and the global average pooling layer may be applied to reduce the spatial dimension. The output of the global average pooling layer may be muti-dimensional (e.g., a size-sixty-four vector, etc.), and the last linear layer of the residual network may output vectors instead of scalars as in an RN2D model.
As shown in
As shown in
In this section, results of experiments on a CTSR benchmark dataset created from the UCR Archive and a transaction dataset based on a real business problem (see e.g.
The CTSR benchmark dataset is created from the UCR Archive, which is a collection of 128 time series classification datasets from various domains such as motion, power demand, and traffic. The UCR Archive is widely used for benchmarking time series classification algorithms. To convert the UCR Archive to a CTSR benchmark dataset, the following steps are used.
First, the following three performance measurements are discussed: PREC@10, AP@10, and NDCG@10. When comparing the performance of the two non-neural network baselines (ED and DTW), it is observed that DTW significantly outperforms ED in all three performance measurements. This suggests that using alignment information helps with the CTSR problem, and similar conclusions have been drawn for the time series classification problem.
When considering the first four neural network baselines (i.e., LSTM, GRU, TF, and RN1 D), each of them significantly outperform the DTW method, which demonstrates that using a high-capacity model helps with the CTSR problem. One possible reason for this is that the CTSR dataset consists of time series from many different domains, and higher capacity models are required for learning diverse patterns within the data. Among the four methods, LSTM outperforms the second best significantly in all three performance measurements.
The RN2D method, a high-capacity model utilizing alignment information, significantly outperforms all other methods according to the t-test results. When comparing the RN2Dw/T method according to non-limiting embodiments or aspects with the RN2D method, the former achieves higher performance in all three performance measurements, although the difference is not significant. Thus, each of the RN2Dw/T methods according to non-limiting embodiments or aspects and the RN2D method can be considered as the better performing methods for the CTSR dataset in terms of the three performance measurements.
When considering the query time, the eight tested methods can be grouped into two categories: slower methods (i.e., DTW and RN2D) with a query time of over 30 seconds, and faster methods (i.e., ED, LSTM, GRU, TF, RN1 D, and RN2Dw/T) where each query takes less than 100 milliseconds. The main difference between the faster and slower groups is that all fast methods compute the relevance score in Euclidean space, while the slower methods compute the scores in other spaces. Overall, the RN2Dw/T method according to non-limiting embodiments or aspects is the best method as it is effective in retrieving relevant time series and efficient in terms of query time.
As shown in
The following observations may be made by examining the retrieved time series for different methods illustrated in
To evaluate the effectiveness and efficiency of different CTSR system designs in addressing the business problem presented in
As shown in
The performance differences between the tested methods are examined under different settings of k, and the results are presented in
To perform approximated nearest neighbor search, the nearest neighbor descent method is used for constructing k-neighborgraphs. The PyNNDescent library is used for implementing the method. The query time is notably reduced by replacing exact nearest neighbor search with the approximate nearest neighbor search method. Moreover, the performances (i.e., PREC@10, AP@10, and NDCG@10) remain exactly the same as the numbers presented the table of
Accordingly, non-limiting embodiments or aspects of the present disclosure may provide an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. For example, non-limiting embodiments or aspects of the present disclosure may outperform existing methods for time series retrieval in terms of both effectiveness and efficiency. Non-limiting embodiments or aspects of the present disclosure may be used identify business types in electronic payment networks, and/or an efficiency of non-limiting embodiments or aspects of the present disclosure may be enhanced by incorporating low-bit representation techniques.
Aspects described include artificial intelligence or other operations whereby the system processes inputs and generates outputs with apparent intelligence. The artificial intelligence may be implemented in whole or in part by a model. A model may be implemented as a machine learning model. The learning may be supervised, unsupervised, reinforced, or a hybrid learning whereby multiple learning techniques are employed to generate the model. The learning may be performed as part of training. Training the model may include obtaining a set of training data and adjusting characteristics of the model to obtain a desired model output. For example, three characteristics may be associated with a desired item location. In such instance, the training may include receiving the three characteristics as inputs to the model and adjusting the characteristics of the model such that for each set of three characteristics, the output device state matches the desired device state associated with the historical data.
In some implementations, the training may be dynamic. For example, the system may update the model using a set of events. The detectable properties from the events may be used to adjust the model.
The model may be an equation, artificial neural network, recurrent neural network, convolutional neural network, decision tree, or other machine-readable artificial intelligence structure. The characteristics of the structure available for adjusting during training may vary based on the model selected. For example, if a neural network is the selected model, characteristics may include input elements, network layers, node density, node activation thresholds, weights between nodes, input or output value weights, or the like. If the model is implemented as an equation (e.g., regression), the characteristics may include weights for the input parameters, thresholds, or limits for evaluating an output value, or criterion for selecting from a set of equations.
Once a model is trained, retraining may be included to refine or update the model to reflect additional data or specific operational conditions. The retraining may be based on one or more signals detected by a device described herein or as part of a method described herein. Upon detection of the designated signals, the system may activate a training process to adjust the model as described.
Further examples of machine learning and modeling features which may be included in the embodiments discussed above are described in “A survey of machine learning for big data processing” by Qiu et al. in EURASIP Journal on Advances in Signal Processing (2016) which is hereby incorporated by reference in its entirety.
Although embodiments have been described in detail for the purpose of illustration, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed embodiments or aspects, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any embodiment or aspect can be combined with one or more features of any other embodiment or aspect. In fact, any of these features can be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of possible implementations includes each dependent claim in combination with every other claim in the claim set.
The present application is the United States national phase of International Patent Application No. PCT/US24/31934, filed May 31, 2024, and claims the benefit of U.S. Patent Provisional Application No. 63/505,570, filed Jun. 1, 2023, the disclosures of which are hereby incorporated by reference in their entireties.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/US24/31934 | 5/31/2024 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 63505570 | Jun 2023 | US |