PRIVATE SYNTHETIC TIME SERIES DATA GENERATION

BACKGROUND

The present disclosure relates to data processing systems. More particularly, the present disclosure relates to private synthetic time series data generation for data processing systems.

Sharing patients' medical longitudinal time series data may enable improved therapy development and technological advances. For example, sharing patients' measured analyte time series data can contribute to the understanding of associated disease mechanisms and the development of technology to improve these patients' qualities of life. Unsurprisingly, there are serious legal and privacy issues that arise when sharing patients' medical longitudinal time series data, such as those described by the Health Insurance Portability and Accountability Act of 1996 (known as HIPAA).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a block diagram of an example system for generating synthetic data, in accordance with embodiments of the present disclosure.

FIG. 2 depicts an example artificial neural network (ANN), in accordance with embodiments of the present disclosure.

FIGS. 3A, 3B, 3C and 3D depict different views of an example recurrent neural network (RNN), in accordance with embodiments of the present disclosure.

FIG. 3E depicts an example data flow diagram for a hidden recurrent module, in accordance with embodiments of the present disclosure.

FIG. 4A depicts a view of an example long short-term memory (LSTM) network, in accordance with embodiments of the present disclosure.

FIGS. 4B and 4C depict example data flow diagrams for an LSTM cell, in accordance with embodiments of the present disclosure.

FIG. 5 depicts an example data flow diagram for a differential-privacy generative adversarial network (DP-GAN), in accordance with embodiments of the present disclosure.

FIG. 6 depicts an example loss function diagram for training the DP-GAN depicted in FIG. 5, in accordance with embodiments of the present disclosure.

FIG. 7 depicts an example data flow diagram for generating batched original data for training the DP-GAN depicted in FIG. 5, in accordance with embodiments of the present disclosure.

FIGS. 8A and 8B depict example data flow diagrams for generating synthetic data by the DP-GAN depicted in FIG. 5, in accordance with embodiments of the present disclosure.

FIG. 9A depicts an example data flow diagram for a motif causality module, in accordance with embodiments of the present disclosure.

FIG. 9B depicts an example data flow diagram for generating motif sequence blocks for training the motif causality module depicted in FIG. 9A, in accordance with embodiments of the present disclosure.

FIG. 10A depicts a data flow diagram for a motif network within the motif causality module depicted in FIG. 9A, in accordance with embodiments of the present disclosure.

FIG. 10B depicts an example data flow diagram for training a neural network within the motif network depicted in FIG. 10A, in accordance with embodiments of the present disclosure.

FIG. 11A depicts an example motif causality matrix, in accordance with embodiments of the present disclosure.

FIG. 11B depicts example motif time series data for two motif causality matrix entries, in accordance with embodiments of the present disclosure.

FIG. 12A depicts traditional time series data generation.

FIG. 12B depicts motif causality time series data generation, in accordance with embodiments of the present disclosure.

FIG. 13 depicts a comparison of longitudinal time series data and synthetic time series data, in accordance with embodiments of the present disclosure.

FIG. 14 depicts a flow chart representing functionality associated with generating synthetic data, in accordance with embodiments of the present disclosure.

DETAILED DESCRIPTION

One potential technical solution to the problem of sharing of patients' medical longitudinal time series data is to generate synthetic (fake) time series data based on the patients' original (real) time series data, such as, for example, a patient's measured glucose traces. However, the synthetic time series data must provide a strong privacy guarantee and protect the privacy of the patients' medical longitudinal time series data while emulating certain important characteristics of the original time series data. A privacy guarantee refers to the degree to which sensitive data, such as a patient's medical data, is protected. A formal notion of a strong privacy guarantee ensures that the probability of disclosing sensitive data is extremely small (e.g., close to zero).

A variety of methodologies may be used to generate synthetic time series data, such as machine learning (ML) techniques, neural networks (NNs), artificial neural networks (ANNs), etc. These methods use training data that may include labels (i.e., labeled data), which are outcomes or labeled parts of the traces that guide the synthetic data generation, or additional information such as multiple variables per time step (i.e., multivariate data), metadata or auxiliary features (information computed during the model training). For example, generative adversarial networks (GANs) may be used to generate synthetic data based on original data. And, while GANs may be trained to generate synthetic time series data based on original time series data, these GANs do not inherently protect the privacy of the original time series data.

Synthetic time series data that protects the privacy of the patients' medical longitudinal time series data may be publicly shared and integrated into many practical applications, such as, for example, blood glucose forecasting, artificial pancreatic systems, computer-based medical diagnostic methodologies, population-level medical studies, etc.

Embodiments of the present disclosure advantageously provide a differential-privacy generative adversarial network (DP-GAN) architecture that includes a motif causality module as well as autoencoder, generator, and discriminator modules. The autoencoder module includes an embedder module and a recovery module. Each module may include, inter alia, one or more ANNs, such as RNNs, LSTM networks, etc., as described below.

Further, embodiments of the present disclosure advantageously provide DP-GAN training methods that include original data, motif data and synthetic data processing techniques, an integrated differential privacy metric, and a loss function that characterizes relationships between important motifs in the original time series data, as described below. A motif is a short, ordered sequence of time steps from a time series (or trace) that characterizes important events in the time series data, such as peaks, troughs, etc. In the context of the present disclosure, motifs are not temporally dependent and do not form recurring temporal patterns.

Importantly, certain embodiments of the present disclosure advantageously relate to training the DP-GAN using unlabeled and univariate original data without any auxiliary (additional) information.

FIG. 1 depicts a block diagram of system 100 for generating synthetic data, in accordance with embodiments of the present disclosure.

Generally, system 100 includes a computer, server, etc., that has one or more single-core or multi-core processors, specialized processors, etc., that are configured to train a neural network, based on longitudinal time series data, to generate synthetic time series data that satisfies a privacy metric.

More particularly, system 100 includes computer 110 coupled to one or more networks 172, one or more I/O devices 182, and one or more displays 192. Computer 110 includes bus 120 coupled to one or more processors 130, storage element or memory 160, one or more communication interfaces 170, one or more I/O interfaces 180, and display interface 190. In many embodiments, computer 110 also includes one or more specialized processors, such as, for example, graphics processing units (GPUs) 140, neural processing units (NPUs) 150, etc. Generally, communication interface(s) 170 are coupled to network(s) 172 using a wired or wireless connection, I/O interface(s) 180 are coupled to I/O device(s) 182 using a wired or wireless connection, and display interface 190 is typically coupled to display(s) 192 using a wired connection.

Bus 120 is a communication system that transfers data between processor(s) 130, memory 160, communication interface(s) 170, I/O interface(s) 180, and display interface 190. In many embodiments, bus 120 also transfers data between these components and GPU(s) 140 and/or NPU(s) 150, as well as other components not depicted in FIG. 1.

Processor(s) 130 include one or more general-purpose or application-specific microprocessors that execute instructions to perform control, computation, input/output, etc. functions for computer 110. Each processor 130 may include a single integrated circuit, such as a micro-processing device, or multiple integrated circuit devices and/or circuit boards working in cooperation to accomplish the appropriate functionality. In addition, processor(s) 130 may execute computer programs or modules, such as operating system 162, software modules 164, etc., stored within memory 160. For example, software modules 164 may include a neural network that includes one or more artificial neural networks (ANNs), recurrent neural networks (RNNs), long short-term memory (LSTM) networks, convolutional neural networks (CNNs), etc.

Generally, memory 160 stores instructions for execution by processor(s) 130 as well as data. Memory 160 may include a variety of non-transitory computer-readable medium that may be accessed by processor(s) 130 as well as other components. In various embodiments, memory 160 may include volatile and nonvolatile medium, non-removable medium and/or removable medium. For example, memory 160 may include any combination of random access memory (RAM), dynamic RAM (DRAM), static RAM (SRAM), read only memory (ROM), flash memory, cache memory, and/or any other type of non-transitory computer-readable medium.

Memory 160 contains various components for retrieving, presenting, modifying, and storing data 166. For example, memory 160 stores software modules 164 that provide functionality when executed by processor(s) 130. Operating system 162 provides operating system functionality for computer 110. Software modules 164 provide various functionality, as described above. Data 166 may include data associated with operating system 162, software modules 164, etc.

Communication interface(s) 170 are configured to transmit data to and from one or more network(s) 172 using one or more wired and/or wireless connections. Network(s) 172 may include one or more local area networks, wide area networks, the Internet, etc., which may execute various network protocols, such as, for example, wired and/or wireless Ethernet, Bluetooth, etc. Network(s) 172 may also include various combinations of wired and/or wireless physical layers, such as, for example, copper wire or coaxial cable networks, fiber optic networks, Bluetooth wireless networks, WiFi wireless networks, CDMA, FDMA and TDMA cellular wireless networks, etc.

I/O interface(s) 180 are configured to transmit and/or receive data from I/O device(s) 182. I/O interface(s) 180 enable connectivity between processor(s) 130, memory 160 and I/O device(s) 182 by encoding data to be sent from processor 130 or memory 160 to I/O device(s) 182, and decoding data received from I/O device(s) 182 for processor(s) 130 or memory 160. Generally, data may be sent over wired and/or wireless connections. For example, I/O interface(s) 180 may include one or more wired communications interfaces, such as USB, Ethernet, etc., and/or one or more wireless communications interfaces, coupled to one or more antennas, such as WiFi, Bluetooth, cellular, etc.

Generally, I/O device(s) 182 provide input to computer 110 and/or output from computer 110. As discussed above, I/O device(s) 182 are operably connected to computer 110 using a wired and/or wireless connection. I/O device(s) 182 may include a local processor coupled to a communication interface that is configured to communicate with computer 110 using the wired and/or wireless connection. For example, I/O device(s) 182 may include a keyboard, mouse, touch pad, joystick, etc.

Display interface 190 is configured to transmit image data from computer 110 to monitor or display 192.

As noted above, software modules 164 may include a neural network that includes one or more ANNs, RNNs, LSTMs, etc.

An ANN models the relationships between input data or signals and output data or signals using a network of interconnected nodes that is trained through a learning process. The nodes are arranged into various layers, including, for example, an input layer, one or more hidden layers, and an output layer. The input layer receives input data, such as, for example, image data, sensor time series data, etc., and the output layer generates output data, such as, for example, a probability that the image data contains a known object, a medical condition, etc. Each hidden layer provides at least a partial transformation of the input data to the output data. A DNN has multiple hidden layers in order to model complex, nonlinear relationships between input data and output data.

In a fully-connected, feedforward ANN, each node is connected to all of the nodes in the preceding layer, as well as to all of the nodes in the subsequent layer. For example, each input layer node is connected to each hidden layer node, each hidden layer node is connected to each input layer node and each output layer node, and each output layer node is connected to each hidden layer node. Additional hidden layers are similarly interconnected. Each connection has a weight value, and each node has an activation function, such as, for example, a linear function, a step function, a sigmoid function, a hyperbolic or tanh operation, a rectified linear unit (ReLu) function, etc., that determines the output of the node based on the weighted sum of the inputs to the node. The input data propagates from the input layer nodes, through respective connection weights to the hidden layer nodes, and then through respective connection weights to the output layer nodes. The sigmoid and ReLu functions output a number between 0 and 1, while the tanh operation outputs a number between −1 and 1, for any given input.

More particularly, at each input node, input data is provided to the activation function for that node, and the output of the activation function is then provided as an input data value to each hidden layer node. At each hidden layer node, the input data value received from each input layer node is multiplied by a respective connection weight, and the resulting products are summed or accumulated into an activation signal value that is provided to the activation function for that node. The output of the activation function is then provided as an input data value to each output layer node. At each output layer node, the output data value received from each hidden layer node is multiplied by a respective connection weight, and the resulting products are summed or accumulated into an activation signal value that is provided to the activation function for that node. The output of the activation function is then provided as output data. Additional hidden layers may be similarly configured to process data.

FIG. 2 depicts ANN 200, in accordance with embodiments of the present disclosure.

ANN 200 includes input layer 210, one or more hidden layers, e.g., hidden layers 210₁, 220₂, . . . , 220_N, and output layer 230. Input layer 210 includes one or more input nodes, e.g., Node_1,1, Node_1,2, . . . , Node_1,i. Hidden layer 2201 includes one or more hidden nodes, e.g., Node_1,1, Node_1,2, . . . , Node_1,j. Hidden layer 220₂includes one or more hidden nodes, e.g., Node_2,1, Node_2,2, . . . , Node_2,k. Hidden layer 220_Nincludes one or more hidden nodes, e.g., Node_N,1, Node_N,2, . . . , Node_N,n. Output layer 230 includes one or more output nodes, e.g., Node_O,1, Node_O,2, . . . , Node_O,o. In the example depicted in FIG. 2, there are N hidden layers; input layer 210 includes “i” nodes, hidden layer 2301 includes “j” nodes, hidden layer 220₂includes “k” nodes, hidden layer 230N includes “n” nodes, and output layer 230 includes “o” nodes.

In certain embodiments, N equals 3, “i” equals 3, “j”, “k” and “n” equal 5 and “o” equals 3. Input Node_1,1, Node_1,2and Node_1,3are each coupled to hidden Node_1,1, Node_1,2, Node_1,3, Node_1,4and Node_1,5. Hidden Node_1,1, Node_1,2, Node_1,3, Node_1,4and Node_1,5are each coupled to hidden Node_2,1, Node_2,2, Node_2,3, Node_2,4and Node_2,5. Hidden Node_2,1, Node_2,2, Node_2,3, Node_2,4and Node_2,5are each coupled to hidden Node_3,1, Node_3,2, Node_3,3, Node_3,4and Node_3,5. Hidden Node_3,1, Node_3,2, Node_3,3, Node_3,4and Node_3,5are each coupled to output Node_0,1, Node_O,2, Node_O,3.

Many other variations of input, hidden and output layers are clearly possible, including hidden layers that are locally-connected, rather than fully-connected, to one another.

Training an ANN includes optimizing the connection weights between nodes by minimizing the prediction error of the output data until the ANN achieves a particular level of accuracy. One method is backpropagation, or backward propagation of errors, which iteratively and recursively determines a gradient (i.e., a partial derivative of the error function) with respect to each weight, and then adjusts each weight to improve the performance of the network.

A multi-layer perceptron (MLP) is a fully-connected ANN that has an input layer, an output layer and one or more hidden layers. MLPs may be used for processing time series data, such as natural language processing, machine translation, speech recognition, etc. Other ANNs include RNNs, LSTM networks, CNNs, etc.

FIG. 3A depicts one view of RNN 300, in accordance with embodiments of the present disclosure.

Generally, RNNs process input sequence data and generate output sequence data, and may be used for many different applications, such as, for example, natural language processing applications (e.g., sentiment analysis, speech recognition, reading comprehension, summarization and translation, etc.), image processing (e.g., image captioning, video classification, etc.), etc. RNNs may be programmed to process many different types of input and output data, such as, for example, fixed input data and fixed output data for image classification, etc., fixed input data and sequential output data for image captioning, etc., sequential input data and fixed output data for sentence “sentiment” classification, etc., sequential input data and sequential output data for machine translation, etc., synced sequential input data and sequential output data for video classification, etc.

RNN 300 includes input layer 310, one or more hidden layers, such as hidden recurrent layer 320, and output layer 330. Generally, an RNN may include one to four hidden recurrent layers; other numbers of hidden recurrent layers are also supported.

Input layer 310 includes one or more input nodes, such as Node_1,1and Node_1,2, that present the input data X to hidden recurrent layer 320 as sequences of input data values, such as, for example, sequences of letters, words, sentences, etc., sequences of measured data values, sensor data values, etc. Generally, each sequence is a time step, and the input data are processed as vectors or matrices. RNN 300 processes the input data values for each time step, and typically executes a loop to process the total number of time steps.

Hidden recurrent layer 320 is a fully connected, recurrent layer that includes hidden recurrent nodes, such as, for example, Node_R,1, Node_R,2, Node_R,3, Node_R,4, . . . , Node_R,r. Each hidden recurrent node maintains or stores a state for a hidden state vector h for this layer, which is updated at each time step of RNN 300. In other words, the hidden state vector h includes a state for each hidden recurrent node in hidden recurrent layer 320. In many embodiments, the size of the hidden state vector h ranges from tens or hundreds to a few thousand elements, such as, for example, 64, 256, 4,096, etc. elements. In certain embodiments, the hidden state vector h may be subsampled to reduce processing requirements.

One or more additional, fully-connected, hidden recurrent layers may follow hidden recurrent layer 320. Each successive, hidden recurrent layer includes hidden recurrent nodes and a corresponding hidden state vector h. The last hidden layer, e.g., hidden recurrent layer 320 depicted in FIG. 3A, presents the hidden state vector h to output layer 330.

Output layer 330 is a fully-connected layer that includes one or more output nodes, e.g., Node_0,1, that generate the output data Y. In certain embodiments, each output node provides an output, such as a predicted class score, probability of a word, sentence, etc., predicted data value, predicted correlation value, etc. A normalization function, such as a Softmax function, may be applied to the output by output layer 330, or, alternatively, by an additional fully-connected layer interposed between the last hidden layer and output layer 330.

FIG. 3B depicts another view of RNN 300, in accordance with embodiments of the present disclosure.

Input layer 310 is depicted as a single element 310′ including the input data X, hidden recurrent layer 320 is depicted as a single element, module or cell 320′ including the hidden state vector h, and output layer 330 is depicted as a single element 330′ including the output data Y.

FIG. 3C depicts another view of RNN 300, in accordance with embodiments of the present disclosure.

The view of RNN 300 depicted in FIG. 3B has been rotated and annotated to indicate the processing configuration of RNN 300 at time step t, i.e., input layer 310′ including the input data X_t, hidden recurrent module 320′ including the hidden state vector h_t, and output layer 330′ including the output data Y_t. In many embodiments, input data X_tis a vector having the same dimension as hidden state vector h_t.

FIG. 3D depicts another view of RNN 300, in accordance with embodiments of the present disclosure.

As noted above, RNN 300 typically executes a loop so that hidden recurrent module 320′ may process the input data X and update the hidden state vector h at each time step. In the view depicted in FIG. 3D, the loop has been “unrolled” and three time steps are shown, i.e., t−1, t and t+1. Accordingly, RNN 300 may be viewed as a chain of repeating hidden recurrent modules or cells 320′.

At time step t−1, the input data X_t−1, the input hidden state vector h_t−2from the previous time step, the hidden state vector h_t−1, and the output data Y_t−1are shown. At time step t, the input data X_t, the input hidden state vector h_t−1from the previous time step, the hidden state vector h_t, and the output data Y_tare shown. At time step t+1, the input data X_t+1, the input hidden state vector h_tfrom the previous time step, the hidden state vector h_t+1, and the output data Y_t+1 are shown.

Generally, the hidden state vector h_tmay be updated by applying an activation function fc to the sum of a weight vector W_statemultiplied by the hidden state vector h_t−1from the previous time step, and a weight vector W_datamultiplied by the input data X_t, as given by Equation 1.

$\begin{matrix} h_{t} = f_{c} (W_{state} \cdot h_{t - 1} + W_{data} \cdot X_{t}) & Eq . 1 \end{matrix}$

The activation function ƒ_cmay be a non-linear activation function, such as, for example, tanh( ) ReLu, etc., applied to each element of the hidden state vector h. In certain embodiments, a bias b_cmay be added to the sum prior to the application of the activation function ƒ_c. The output data Y_tis the product of a weight vector W_outputmultiplied by the hidden state vector h_t, as given by Equation 2.

$\begin{matrix} Y_{t} = W_{output} \cdot h_{t} & Eq . 2 \end{matrix}$

In certain embodiments, an activation function ƒ_omay be applied to the product of the weight vector W_outputand the hidden state vector h_t, such as, for example, tanh( ) ReLu, etc., to generate the output data Y_t, as given by Equation 3.

$\begin{matrix} Y_{t} = f_{o} (W_{output} \cdot h_{t}) & Eq . 3 \end{matrix}$

In certain embodiments, a bias b_omay be added to the product prior to the application of the activation function ƒ_o.

FIG. 3E depicts a data flow diagram 302 for hidden recurrent module 320′, in accordance with embodiments of the present disclosure.

Hidden recurrent module 320′ is shown at time step t. Hidden recurrent module 320′ includes tanh or sigmoid layer 322, which receives hidden state vector h_t−1and input data vector X_t, applies a tanh operation to the sum of the weight vector W_statemultiplied by the hidden state vector h_t−1from the previous time step, and the weight vector W_datamultiplied by the input data X_t, to generate hidden state vector h_t, as given by Equation 1. The hidden state vector h_tis output to output layer 330, and provided to, or stored for use by, the next time step.

Similar to ANNs, training an RNN includes optimizing the weights by minimizing the prediction error of the output data until the RNN achieves a particular level of accuracy. As noted above, backpropagation through time may be used to iteratively and recursively determine a gradient (i.e., a partial derivative of the error function) with respect to each weight, and then adjust each weight to improve the performance of the RNN. However, when the gradient for one or more of the weights becomes too small (i.e., when the gradient “vanishes”), these weights are not adjusted and training eventually stops. This issue is known as the vanishing gradient problem.

An LSTM network is a variation of an RNN that, among other advantages, addresses the vanishing gradient problem by increasing the complexity of each hidden recurrent module or cell in order to generate and maintain more information than just the hidden state vector h, i.e., a cell state vector C. LSTM networks also avoid the RNN's long-term dependency problem.

FIG. 4A depicts a view of LSTM network 400, in accordance with embodiments of the present disclosure.

LSTM network 400 also typically executes a loop so that LSTM module or cell 420 may process each time step. In the view depicted in FIG. 4A, the loop has been “unrolled” and three time steps are shown, similar to the view of RNN 300 depicted in FIG. 3D. Accordingly, LSTM network 400 may also be viewed as a chain of repeating LSTM cells 420.

At time step t−1, the input data X_t−1, the input hidden state vector h_t−2and the input cell state vector C_t−2from the previous time step, the hidden state vector h_t−1, the cell state vector C_t−1, and the output data Y_t−1are shown. At time step t, the input data X_t, the input hidden state vector h_t−1and the input cell state vector C_t−1from the previous time step, the hidden state vector h_t, the cell state vector C_t, and the output data Y_tare shown. At time step t+1, the input data X_t+1, the input hidden state vector h_tand the input cell state vector C_tfrom the previous time step, the hidden state vector h_t+1, the cell state vector C_t+1, and the output data Y_t+1are shown.

FIG. 4B depicts a data flow diagram 402 of LSTM cell 420, in accordance with embodiments of the present disclosure.

LSTM cell 420 is shown at time step t. LSTM cell 420 receives or retrieves cell state vector C_t−1and hidden state vector h_t−1from the previous time step, receives input data X_tfrom input layer 410 for the current time step, processes these data to generate the cell state vector C_tand the hidden state vector h_tfor the current time step, sends the hidden state vector h_tfor the current time step to output layer 430, and sends or stores the hidden state vector h_tand the cell state vector C_tfor the next time step.

LSTM cell 420 includes, inter alia, cell storage (not shown for clarity), forget gate 440, input gate 450, output gate 460, and cell state update segment 470. LSTM cell 420 may be implemented by software modules, processes, routines, etc., by hardware components, circuits, etc., by a combination of hardware and software components, etc.

Forget gate 440 determines which elements of the cell state vector C_t−1should be discarded (i.e., “forgotten”) or kept (i.e., “remembered”) based on the hidden state vector h_t−1and the input data X_t. Input gate 450 generates new information to be added to the cell state vector C_t−1based on the hidden state vector h_t−1and the input data X_t. Cell state update segment 470 updates the cell state vector C_t−1, based on the output of forget gate 440 and input gate 450, to generate the cell state vector C_t. Output gate 460 generates the hidden state vector h_tbased on the hidden state vector h_t−1, the input data X_tand the updated cell state vector C_t.

FIG. 4C depicts a data flow diagram 404 for LSTM cell 420, in accordance with embodiments of the present disclosure.

LSTM cell 420 is shown at time step t. Hidden state vector h_t−1and input data vector X_tare provided to forget gate 440, input gate 450, and output gate 460. In many embodiments, concatenation operation 422 concatenates the hidden state vector h_t−1and input data vector X_tto form a concatenated input vector [h_t−1, X_t], which is provided to forget gate 440, input gate 450, and output gate 460. In other embodiments, the hidden state vector h_t−1and input data vector X_tare provided separately to forget gate 440, input gate 450, and output gate 460.

Forget gate 440 includes sigmoid layer 442, which receives the concatenated input vector [h_t−1, X_t] from concatenation operation 422, applies a concatenated weight vector W_fto the concatenated input vector [h_t−1, X_t] to generate a weighted concatenated input vector W_f·[h_t−1, X_t], and applies the sigmoid function to the weighted concatenated input vector W_f·[h_t−1, X_t] to generate the activation vector f_t, as given by Equation 4. The concatenated weight vector W_fis a concatenation of a weight vector for the hidden state vector h_t−1and a weight vector for the input data vector X_t.

$\begin{matrix} f_{t} = σ (W_{f} \cdot [h_{t - 1}, X_{t}]) & Eq . 4 \end{matrix}$

In certain embodiments, a bias b_fmay be added to the weighted concatenated input vector W_f·[h_t−1, X_t] prior to the application of the sigmoid function σ. Sigmoid layer 442 provides the activation vector f_tto element-wise multiplication operation 476 within cell state update segment 470.

Input gate 450 includes sigmoid layer 452, tanh layer 454 and element-wise multiplication operation 456. Sigmoid layer 452 receives the concatenated input vector [h_t−1, X_t] from concatenation operation 422, applies a concatenated weight vector W_ito the concatenated input vector [h_t−1, X_t] to generate a weighted concatenated input vector W_i·[h_t−1, X_t], and applies the sigmoid function to the weighted concatenated input vector W_i·[h_t−1, X_t] to the generate activation vector it, as given by Equation 5.

$\begin{matrix} i_{t} = σ (W_{i} \cdot [h_{t - 1}, X_{t}]) & Eq . 5 \end{matrix}$

In certain embodiments, a bias bi may be added to the weighted concatenated input vector W_i·[h_t−1, X_t] prior to the application of the sigmoid function σ. Sigmoid layer 452 provides the activation vector it to element-wise multiplication operation 456.

Tanh layer 454 receives the concatenated input vector [h_t−1, X_t] from concatenation operation 422, applies a concatenated weight vector W_Cto the concatenated input vector [h_t−1, X_t] to generate a weighted concatenated input vector W_C·[h_t−1, X_t], and applies the tanh operation to the weighted concatenated input vector W_C·[h_t−1, X_t] to generate activation vector {tilde over (C)}_t, as given by Equation 6.

$\begin{matrix} {\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, X_{t}]) & Eq . 6 \end{matrix}$

In certain embodiments, a bias b_Cmay be added to the weighted concatenated input vector W_C·[h_t−1, X_t] prior to the application of the tanh operation. Tanh layer 454 provides the activation vector {tilde over (C)}_tto element-wise multiplication operation 456. Element-wise multiplication operation 456 multiplies the activation vector it and the activation vector C_tto generate an intermediate product, which is provided to element-wise addition operation 478 within cell state update segment 470.

Output gate 460 includes sigmoid layer 462, element-wise multiplication operation 466, and element-wise tanh operation 464. Sigmoid layer 462 receives the concatenated input vector [h_t−1, X_t] from concatenation operation 422, applies a concatenated weight vector W_oto the concatenated input vector [h_t−1, X_t] to generate a weighted concatenated output vector W_o·[h_t−1, X_t], and applies the sigmoid function to the weighted concatenated output vector W_o. [h_t−1, X_t] to generate activation vector o_t, as given by Equation 7.

$\begin{matrix} o_{t} = σ (W_{o} \cdot [h_{t - 1}, X_{t}]) & Eq . 7 \end{matrix}$

In certain embodiments, a bias b_omay be added to the weighted concatenated input vector W_o·[h_t−1, X_t] prior to the application of the sigmoid function σ. Sigmoid layer 462 provides the activation vector o_tto element-wise multiplication operation 466.

Tanh operation 464 receives the cell state vector C_t, applies the tanh operation to the cell state vector C_t, and provides the result to element-wise multiplication operation 466, which multiplies the outputs of sigmoid layer 462 and tanh operation 464 to generate the hidden state vector h_t, as given by Equation 8.

$\begin{matrix} h_{t} = o_{t} \cdot \tanh (C_{t}) & Eq . 8 \end{matrix}$

Element-wise multiplication operation 476 within cell state update segment 470 receives the cell state vector C_t−1, and multiplies activation vector f_tand cell state vector C_t−1to generate an intermediate vector product, which is provided to element-wise addition operation 478. The intermediate vector product generated by element-wise multiplication operation 476 and element-wise multiplication operation 456 are added together to generate the cell state vector C_t, as given by Equation 9.

$\begin{matrix} C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t} & Eq . 9 \end{matrix}$

The hidden state vector h_tis output to output layer 430. The hidden state vector h_tand the cell state vector C_tare provided to, or stored for use by, the next time step.

FIG. 5 depicts a data flow diagram 502 for DP-GAN 500, in accordance with embodiments of the present disclosure.

As noted above, GANs may be used to generate synthetic data based on original data. Generally, GANs include a generator neural network and a discriminator neural network. The generator neural network learns from the original data and works to generate synthetic data. The discriminator neural network receives samples of both original (real) data and synthetic (fake) data, and “guesses” whether each sample is real or fake. The generator neural network and the discriminator neural network are trained adversarially, i.e., against each other. The generator neural network attempts to fool the discriminator neural network into guessing that the synthetic data is real, and the discriminator neural network attempts to become very good at guessing which samples are actually real or fake. When the training is successful, the generator neural network becomes very good at generating synthetic data that fools the discriminator neural network into guessing that the synthetic data is real.

Differential privacy is a formal notion of privacy that bounds the risk to any person who provides data for subsequent processing. In a DP-GAN, noise is drawn from carefully designed distributions and applied to the weights of the generator neural network and the discriminator neural network to protect the privacy of the individuals associated with the data. From one perspective, the addition of noise prevents the DP-GAN's generator and discriminator neural networks from memorizing or disclosing any sensitive or personal information from the original data.

DP-GAN 500 includes motif causality module 510, autoencoder module 520, generator module 530, discriminator module 540, preprocessor module 526 and postprocessor module 528. Autoencoder module 520 includes embedder module 522 and recovery module 524. Each module may include, inter alia, one or more neural networks, such as RNNs, LSTM networks, etc., implemented as software modules, processes, routines, etc. In many embodiments, these software components may be executed by processor(s) 130. In certain embodiments, at least a portion of these software components may be executed by GPU 140 or NPU 150. In other embodiments, these software components may be executed by a combination of processor 130 and GPU 140, processor 130 and NPU 150, or processor 130, GPU 140 and NPU 150.

In many embodiments, medical longitudinal time series data (traces) for a population of patients (persons) may be divided into two sets, i.e., set A and set B. Each set includes a number of traces for a different group of patients (persons) within the population. Set A may include the same number of traces as set B, set A may include less traces than set B, or set A may include more traces than set B. In certain embodiments, the medical longitudinal time series data may be entirely divided into set A and set B, while in other embodiments, the medical longitudinal time series data may be partially divided into set A and set B based upon a selection criteria. For example, a patient's medical longitudinal time series data may be selected based on quality. As an example, un-selected patient medical longitudinal time series data may exhibit undesirable characteristics such as measurement noise, data dropouts, etc. Generally, a patient's medical longitudinal time series data are bounded and include at least 50 time steps. Certain medical longitudinal time series data may include 100 or more time steps, such as, for example, 288 time steps for 24 hours of continuous glucose monitoring (CGM) data (i.e., 12 measurements/hour).

Original longitudinal time series data (set A) 550 are provided to motif causality module 510 as original data x_m(data 553). Original longitudinal time series data (set B) 560 are provided to preprocessor module 526 as original data x(data 563).

Motif causality module 510 includes a data processing module (not shown for clarity) that processes original data x_mto generate a number of non-overlapping motif data partitions. Each motif data partition is provided to a different motif network 512. Each motif network 512 generates a motif causality matrix M_i, which are aggregated into aggregated motif causality matrix M_Aand provided to generator module 530. During training, motif networks 512 learn the relationships amongst motifs and express these relationships in causality matrices, which are aggregated into an aggregated motif causality matrix M_Ato preserve patient privacy. Motif causality module 510 is discussed in more detail below.

Preprocessor module 526 preprocesses original data x to generate batched original data x, and provides the batched original data x to embedder module 522.

Embedder module 522 reduces the dimensionality of the batched original data x to generate an embedded set of traces, i.e., embedded original data x_e(data 523). In many embodiments, embedder module 522 includes a neural network with an input layer, at least one hidden layer, such as, for example, an RNN layer (e.g., hidden recurrent layer 320, hidden recurrent module 320′, etc.) or LSTM layer (e.g., LSTM cell 420, etc.), and an output layer. In certain embodiments, the neural network may be a CNN. Other neural network architectures are also supported.

Generator module 530 generates embedded synthetic data {circumflex over (x)}_e(data 533) based on embedded original data x_eand aggregated motif causality matrix 513 (M_A). In many embodiments, generator module 530 includes a neural network with an input layer, at least one hidden layer, such as, for example, an RNN layer (e.g., hidden recurrent layer 320, hidden recurrent module 320′, etc.) or LSTM layer (e.g., LSTM cell 420, etc.), and an output layer. Other neural network architectures are also supported.

Postprocessor module 528 reconstructs the embedded synthetic data custom-character in the original data space to generate synthetic data {circumflex over (x)} (data 573), which may be output as synthetic longitudinal time series data 570.

Recovery module 524 reconstructs the embedded original data x_ein the original data space to generate recovered original data % (data 525). In many embodiments, recovery module 524 includes a neural network with an input layer, at least one hidden layer, such as, for example, an RNN layer (e.g., hidden recurrent layer 320, hidden recurrent module 320′, etc.) or LSTM layer (e.g., LSTM cell 420, etc.), and an output layer. In certain embodiments, the neural network may be a CNN. Other neural network architectures are also supported.

Discriminator module 540 receives the embedded original data x_eand guesses whether the embedded original data x_eis real or fake. Similarly, discriminator module 540 also receives the embedded synthetic data custom-character and guesses whether the embedded synthetic data is real or fake. The guesses may be output as embedded original guesses 543 and embedded synthetic guesses 545. In many embodiments, discriminator module 540 includes a neural network with an input layer, at least one hidden layer, such as, for example, an RNN layer (e.g., hidden recurrent layer 320, hidden recurrent module 320′, etc.) or LSTM layer (e.g., LSTM cell 420, etc.), and an output layer. Other neural network architectures are also supported.

During training, carefully calibrated noise (not shown for clarity) is added to the weights of autoencoder module 520 (i.e., embedder module 522 and recovery module 524), generator module 530 and discriminator module 540 to ensure each network upholds differential privacy, e.g., satisfies a privacy metric, to preserve patient privacy. To generate the synthetic data, weight noise generator 580 generates weight noise (Z) 583, which is received as input to generator module 530 and passed through recovery module 524 to postprocessor module 528 which outputs the final synthetic data {circumflex over (x)}. In many embodiments, weight noise (Z) 583 is a random vector of noise.

The embedded original data x_eand the embedded synthetic data custom-character are used to train generator module 530 and discriminator module 540 rather than the original data x and the synthetic data {circumflex over (x)}. By reducing the dimensionality of the space in which generator module 530 and discriminator module 540 learn, these networks focus on and learn the most important parts or motifs of the traces.

FIG. 6 depicts a loss function diagram 600 for training DP-GAN 500 depicted in FIG. 5, in accordance with embodiments of the present disclosure.

In many embodiments, the modules of DP-GAN 500 may be trained in a particular sequence. First, motif causality module 510 is trained, using original data x_m(data 553), to generate aggregated motif causality matrix 513 (M_A). The remaining modules of DP-GAN 500 are then trained in sequence (e.g., within each epoch), using original data x(data 563) and aggregated motif causality matrix 513 (M_A), to generate synthetic data {circumflex over (x)}. In certain embodiments, autoencoder module 520 is trained, then generator module 530 and discriminator module 540 are adversarially trained, and then embedder module 522 of autoencoder module 520 is trained a second time.

In many embodiments, six loss functions are used to train autoencoder module 520, generator module 530 and discriminator module 540, including reconstruction loss (L_R) 610, stepwise loss (L_S) 620, distributional loss (L_D) 630, motif causality loss (L_M) 640, adversarial loss fake (L_Af) 650, and adversarial loss real (L_Ar) 660. Other loss functions, as well as subsets of these loss functions, are also supported.

Reconstruction loss (L_R) 610 is the root mean square error (RMSE) between original data x and recovered original data {tilde over (x)}. A “perfect” autoencoder perfectly reconstructs the original data, such that x={tilde over (x)}.

Stepwise loss (L_S) 620 is the mean square error (MSE) between batches of embedded original data x_eand batches of embedded synthetic data custom-character . Generator module 530 compares, and learns to correct, the discrepancies between stepwise data distributions using stepwise loss (L_S) 620. In other words, generator module 530 learns to better generate the next time step batch of data by looking at the difference in its generated next step and the real next step.

Distributional loss (L_D) 630 is the moments loss between the distribution of original data x and the distribution of synthetic data {circumflex over (x)}. Generator module 530 learns to generate a diverse set of traces, and not the same type of trace over and over again, using distributional loss (L_D) 630.

Motif causality loss (L_M) 640 is the MSE between motif causality matrix computed on original data, M_x, and motif causality matrix computed on synthetic data, M_{{circumflex over (x)}}. Generator module 530 computes the motif causality matrix M_{{circumflex over (x)}}after the set of embedded synthetic data custom-character is run back through recovery module 524 and postprocessor module 528 to generate the synthetic data in the original space {circumflex over (x)}. Generator module 530 learns to generate synthetic data that yields a realistic causal matrix (thereby identifying appropriate causal relationships from the motifs), and implicitly learns not to generate unrealistic motif sequences, using motif causality loss (L_M) 640.

Adversarial loss fake (L_Af) 650 is the binary cross entropy (BCE) between the discriminator guesses on the synthetic data {circumflex over (x)}, i.e., embedded synthetic guesses 545, and the ground truth, i.e., a vector of 1's.

Adversarial loss real (L_Ar) 660 is the BCE between the discriminator guesses on the original data x, i.e., embedded original guesses 543, and the ground truth, i.e., a vector of 0's.

Autoencoder module 520 is trained to minimize a weighted combination of reconstruction loss (L_R) 610 and stepwise loss (L_S) 620 (α is weight hyperparameter), as given by Equation 10, in order to avoid overspecialization.

$\begin{matrix} Minimize [ℒ_{R} + α ℒ_{S}] & Eq . 10 \end{matrix}$

Generator module 530 is trained to minimize a weighted combination of stepwise loss (L_S) 620, distributional loss (L_D) 630, motif causality loss (L_M) 640 and adversarial loss fake (L_Af) 650 (n is a weight hyperparameter), as given by Equation 11. Stepwise loss (L_S) 620 enables the dual training of autoencoder module 520 and generator module 530.

$\begin{matrix} Minimize [(1 - ℒ_{AS}) + η ℒ_{S} + η ℒ_{D} + ℒ_{M}] & Eq . 11 \end{matrix}$

Discriminator module 540 is trained to minimize a weighted combination of adversarial loss fake (L_Af) 650 and adversarial loss real (L_Ar) 660, as given by Equation 12.

$\begin{matrix} Minimize [ℒ_{Af} + ℒ_{Ar}] & Eq . 12 \end{matrix}$

In one embodiment, α is 0.1 and η is 10; other values are also supported. These training objectives and loss functions train DP-GAN 500 to generate high quality, long time series synthetic data.

FIG. 7 depicts data flow 700 for generating batched original data for training DP-GAN 500 depicted in FIG. 5, in accordance with embodiments of the present disclosure.

In the exemplary embodiment depicted in FIG. 7, original data 710 includes 100 traces, i.e., trace 712₁, . . . , 712₁₀₀, and each trace includes 288 data values (time steps). For example, trace 712₁includes original data X_1,1, . . . , X_1,288, and so on; trace 712₁₀₀includes original data X_100,1, . . . , X_100,288. In many embodiments, preprocessor module 526 preprocesses original data x(data 563) to generate batched original data x(as discussed above).

Preprocessor module 526 applies a sliding window (width 24, stride length 1) to each trace in original data 710 to expand each trace into a batched data slice including 264 time chunks, each time chunk including 24 data values (time steps). For trace 712₁, the sliding window is applied to the first 24 data values, i.e., data value sequence 7141, to generate time chunk 7241 of batched data slice 722₁, which includes X_1,1, X_1,2, . . . , X_1,23, X_1,24. The sliding window is then moved one data value position to the right and applied to the next 24 data values, i.e., data value sequence 714₂, to generate time chunk 724₂of batched data slice 722₁, which includes X_1,2, X_1,3, . . . , X_1,24, X_1,25. And so on. Data value sequence 714₂₆₃generates time chunk 724₂₆₃of batched data slice 722₁, which includes X_1,263, X_1,264, . . . , X_1,286, X_1,287, and data value sequence 714₂₆₄generates time chunk 724₂₆₄of batched data slice 722₁, which includes X_1,264, X_1,265, . . . , X_1,287, X_1,288. The remaining traces are processed in a similar manner; finally, trace 712₁₀₀generates batched data slice 722₁₀₀. Batched original data 720 includes all of the batched data slices 722_i, i.e., batched data slice 722₁, . . . , batched data slice 722₁₀₀. Other methods for generating batched original data 720 are also supported.

Embedder module 522 then reduces the dimensionality of the batched original data x to generate an embedded set of traces, i.e., embedded original data x_e(data 523). In the exemplary embodiment, embedder module 522 may reduce the number of chunks from 264 to 128 in each batched data slice to generate embedded original data x_e.

FIGS. 8A and 8B depict data flow 800 for generating synthetic data 830 by DP-GAN 500 depicted in FIG. 5, in accordance with embodiments of the present disclosure.

In many embodiments, postprocessor module 528 reconstructs the embedded synthetic data {circumflex over (x)}_e(data 533) in the original data space to generate synthetic data {circumflex over (x)} (data 573), which may be output as synthetic longitudinal time series data 570 (as discussed above).

In the exemplary embodiment depicted in FIGS. 8A and 8B, embedded synthetic data 810 includes 100 traces, i.e., trace 812₁, . . . , 812₁₀₀, each trace includes 128 time chunks, and each time chunk includes 24 data values (time steps). For example, trace 812₁includes time chunk 824₁, . . . , time chunk 824₁₂₈; time chunk 824₁includes embedded synthetic data X^h_1,1, . . . , X^h_1,24, . . . , time chunk 824₁₂₈includes embedded synthetic data X^h_128,1, . . . , X^h_128,24(X^hrepresents {circumflex over (X)} in FIGS. 8A and 8B).

Postprocessor module 528 first serializes each trace 812; of embedded synthetic data 810 into a single row of reformed embedded synthetic data 820. For example, trace 812₁is formed into serialized trace 822₁by first placing time chunk 824₁into the first row of reformed embedded synthetic data 820, placing time chunk 8242 into the first row of reformed embedded synthetic data 820 after time chunk 824₁, and so on, until time chunk 824₁₂₈is placed into the first row of reformed embedded synthetic data 820 after time chunk 824127, thereby completing the formation of serialized trace 822₁. The indexing for the elements of serialized trace 822₁is shown to transition from time chunk/time step-based indices (e.g., X^h_1,1, . . . , X^h_1,24, X^h_128,1, . . . , X^h_128,24, etc.) to trace/time step-based indices (e.g., X^h_1,1, . . . , X^h_1,3072, X^h_100,1, . . . , X^h_100,3072, etc.). The remaining traces 812; of embedded synthetic data 810 are serialized in a similar manner, concluding with the formation of serialized trace 822₁₀₀from trace 812₁₀₀.

Postprocessor module 528 then applies a reverse sliding window (i.e., a sliding average) to each serialized trace 822; of reformed embedded synthetic data 820 to reconstruct the embedded synthetic data in the original space of 100 traces, each with 288 data values (time steps). Generally, the reverse sliding window averages groups of data values in each serialized trace 822_i, based on the width (t time steps) and stride length (s time steps) of the window, to generate each synthetic trace 832_i. For example, serialized trace 822₁is formed into synthetic trace 832₁by applying the reverse sliding window to data values X^h_1,1, . . . , X^h_1,3072to generate data values X^h_1,1, . . . , X^h_1,288. And so on. Finally, serialized trace 822₁₀₀is formed into synthetic trace 832₁₀₀by applying the reverse sliding window to data values X^h_100,1, . . . , X^h_100,3072to generate data values X^h_100,1, . . . , X^h_100,288. Synthetic data 830 includes synthetic trace 832₁, . . . , 832₁₀₀. Other methods for generating synthetic data 830 are also supported.

FIG. 9A depicts a data flow diagram 900 for motif causality module 510, in accordance with embodiments of the present disclosure.

Motif causality module 510 includes data processing module 910, a number (N) of motif networks 512₁, 512₂, . . . , 512_N, and motif causality matrix aggregation module 940. During the training of motif causality module 510, data processing module 910 generates a number (N) of non-overlapping motif data partitions 920₁, 920₂, . . . , 920_Nfrom original data x_m(data 553). Each motif data partition 920_iincludes data for different patients from original longitudinal time series data (set A) 550. Each motif network 512₁, 512₂, . . . , 512_Nreceives a different motif data partition 920_i, i.e., motif network 512₁receives motif data partition 920₁, motif network 512₂receives motif data partition 920₂, and so on.

Each motif network 512_igenerates a motif causality matrix 930_i(M_i) based on the respective motif data partition 920_i, i.e., motif network 512₁generates motif causality matrix 930₁(M₁) based on the motif data partition 920₁, motif network 512₂generates motif causality matrix 930₂(M₂) based on the motif data partition 920₂, and so on. As noted above, each motif causality matrix 930_i(M_i) expresses the relationships among motifs that motif network 512_ilearns during training. Generally, motif causality matrix 930_i(M_i) includes motif causality values c and has a width≤m and a height≤m(where m is the number of motifs to be analyzed in the data partition, discussed below). Each causality factor c_j,kexpresses the strength of the relationship between two motifs (e.g., motif j and motif k), and may have values between 0 (i.e., indicating a weak relationship) and 1 (i.e., indicating a strong relationship). Other values are also supported.

Motif causality matrix aggregation module 940 aggregates motif causality matrices 930₁(M₁), 930₂(M₂), . . . , 930_N(MN) into aggregated motif causality matrix 513 (M_A) to preserve patient privacy, i.e., to satisfy a privacy metric. Aggregated motif causality matrix 513 (M_A) is provided to generator module 530 during its training to focus generator module 530 on retaining the important motifs (events) within the traces of embedded original data x_e.

FIG. 9B depicts data flow diagram 902 for generating batched motif data 960 for training motif causality module 510 depicted in FIG. 9A, in accordance with embodiments of the present disclosure.

After data processing module 910 generates each motif data partition 920₁, 920₂, . . . , 920_N, data processing module 910 further processes each motif data partition 920₁, 920₂, . . . , 920_Nto generate respective batched motif data 960.

In the exemplary embodiment depicted in FIG. 9B, motif data partition 920 includes 100 traces, and each trace includes 288 data values (time steps). For example, the first trace includes original data X_1,1, . . . , X_1,288, and so on; the last trace includes original data X_100,1, . . . , X_100,288. Other numbers of traces and numbers of data values (time steps) are also supported.

Motif data partition 920 may be notionally divided into a number of motif blocks, one motif block for each motif to be analyzed. Three motif blocks are depicted, and each motif block includes 96 data values (time steps) for each trace, i.e., motif block 921 for motif 1, motif block 922 for motif 2, and motif block 923 for motif 3. Motif block 921 includes data values X_1,1, . . . , X_1,96, . . . , X_100,1, . . . , X_100,96. Motif block 922 includes data values X_1,97, . . . , X_1,192, . . . , X_100,97, . . . , X_100,192. Motif block 923 includes data values X_1,193, . . . , X_1,288, . . . , X_100,193, . . . , X_100,288. While motif data partition 920 may be divided into 2 motif blocks, motif data partition 920 is typically divided into 3 or more motif blocks.

Data processing module 910 divides motif blocks 921, 922 and 933 into separate motif blocks for each trace, and then stacks the separate motif blocks into motif block stack 950. For the first trace, motif block 921₁includes data values X_1,1, . . . , X_1,96, motif block 922₁includes data values X_1,97, . . . , X_1,192, and motif block 923₁includes data values X_1,193, . . . , X_1,288. And so on. For the last trace, motif block 921₁₀₀includes data values X_100,1, . . . , X_100,96, motif block 922₁₀₀includes data values X_100,97, . . . , X_100,192, and motif block 923₁₀₀includes data values X_100,193, . . . , X_100,288. Accordingly, motif block stack 950 includes 300 motif blocks.

Data processing module 910 then applies a sliding window (width 24, stride length 1) to each motif block in motif block stack 950 to expand each motif block into a motif sequence block that includes 72 overlapping motif sequences, each motif sequence including 24 data values (time steps).

For motif block 921₁, the sliding window is applied to the first 24 data values to generate motif sequence 964₁of motif sequence block 962₁, which includes X_1,1, X_1,2, . . . , X_1,23, X_1,24. The sliding window is then moved one data value position to the right and applied to the next 24 data values to generate motif sequence 964₂of motif sequence block 962₁, which includes X_1,2, X_1,3, . . . , X_1,24, X_1,25. And so on. For example, motif sequence 964₇₁includes X_1,71, X_1,72, . . . , X_1,94, X_1,95, and motif sequence 964₇₂includes X_1,72, X_1,73, . . . , X_1,95, X_1,96.

For motif block 922₁, the sliding window is applied to the data values X_1,97, . . . , X_1,192to generate motif sequence block 962₂(not shown for clarity). For motif block 923₁, the sliding window is applied to the data values X_1,193, . . . , X_1,288to generate motif sequence block 962₃(not shown for clarity). And so on for the remaining motif blocks. For example, for motif block 923₁₀₀, the sliding window is applied to the data values X_100,193, . . . , X_100,288to generate motif sequence block 962₃₀₀. Other methods for generating batched motif data 960 are also supported.

FIG. 10A depicts a data flow diagram 1000 for motif network 512_i, in accordance with embodiments of the present disclosure.

Motif network 512_iincludes a number (m) of neural networks 1010₁, 1010₂, . . . , 1010_m, and weight combination module 1030. The number m is the number of motifs that are being analyzed, as described above. Neural network 1010₁includes weight matrix 1020₁(W₁), neural network 1010₂includes weight matrix 1020₂(W₂), and so on. Neural network 1010_mincludes weight matrix 1020_m(W_m).

Each neural network 1010_jis trained using motif data partition 920_i, as discussed below. Weight combination module 1030 linearly combines weight matrices 1020₁, 1020₂, . . . , 1020_mto generate motif causality matrix 930_i. Generally, each weight matrix 1020; includes weights w and has a width equal to the sliding window width (e.g., 24 time steps) and a height equal to m.

FIG. 10B depicts data flow diagram 1002 for training neural network 1010_jwithin motif network 512_idepicted in FIG. 10A, in accordance with embodiments of the present disclosure.

In certain embodiments, loss module 1040 and weight adjustment module 1050 may be provided for each neural network 1010_jwithin motif network 512_i. In other embodiments, loss module 1040 and weight adjustment module 1050 may be provided for motif network 512_iand used to train each neural network 1010_j.

In many embodiments, neural network 1010; includes an input layer, at least one hidden layer, such as, for example, an RNN layer (e.g., hidden recurrent layer 320, hidden recurrent module 320′, etc.) or LSTM layer (e.g., LSTM cell 420, etc.), and an output layer. In certain embodiments, a convolutional layer may precede the output layer. Other neural network architectures are also supported.

Generally, neural network 1010_jis trained with respect to a particular “ground truth” motif sequence block 962_jwithin batched motif data 960 to learn the causal relationships between ground truth motif sequence block 962_jand all the other motif sequence blocks within batched motif data 960. More particularly, neural network 1010_jgenerates a predicted motif sequence block 1062 based on batched motif data 960. Loss module 1040 determines whether the weights (W_j) for neural network 1010_jshould be adjusted by comparing predicted motif sequence block 1062 to ground truth motif sequence block 962_jusing a loss function, such as, for example, MSE, RMSE, etc.

FIG. 11A depicts motif causality matrix 1100, in accordance with embodiments of the present disclosure.

Motif causality matrix 1100 is a 10×10 matrix which presents motif causality values for 100 pairs of motifs. X-axis 1102 includes 10 motif bins, Y-axis 1104 includes 10 motif bins, and scale 1106 ranges from 0 (i.e., no causal relationship between motifs) to 1 (strong causal relationship between motifs). For example, motif causality element 1120 has a value of 0.382 and indicates somewhat causal relationship between motif 100 (i.e., bin 5 on the X-axis) and motif 281 (i.e., bin 7 on the Y-axis). Motif causality element 1120 has a value of 0.424 and indicates a slightly higher causal relationship between motif 140 (i.e., bin 6 on the X-axis) and motif 297 (i.e., bin 9 on the Y-axis).

FIG. 11B depicts motif comparisons 1120 and 1130 for motif causality elements 1108 and 1110, respectively, in accordance with embodiments of the present disclosure.

Motif comparison 1120 includes graph 1122 depicting time series data 1124 (i.e., glucose values vs. time) for motif 100, graph 1126 depicting time series data 1128 (i.e., glucose values vs. time) for motif 281, and motif causality element 1108 having a value of value 0.382. Similarly, motif comparison 1130 includes graph 1132 depicting time series data 1134 (i.e., glucose values vs. time) for motif 140, graph 1136 depicting time series data 1138 (i.e., glucose values vs. time) for motif 297, and motif causality element 1110 having a value of value 0.424.

For medical longitudinal time series data, the sequence of important motifs is more informative than every single previous timestep. Traditional time series data generation methods, such as autoregressive models, assume that the time series is dependent on all previous time steps within the window, and generate a value for x at time t based on a sequence of the previous values of x. Importantly, these methods only conserve temporal relationships (e.g., information from previous time steps within the window), and ignore any other potentially informative relationships within the same time series (e.g., x) or between different time series.

FIG. 12A depicts traditional time series data generation 1200.

The value of x at time step t(i.e., x value 1206) depends on the values of x at time steps t−1 (i.e., x value 1201), t−2 (i.e., x value 1202), t−3 (i.e., x value 1203), t−4 (i.e., x value 1204), and t−5 (i.e., x value 1205). While the previous values for x may be weighted in a linear combination, subject to dropout, etc., traditional methods heavily depend on window size and miss long term relationships between different time series.

FIG. 12B depicts motif causality time series data generation 1210, in accordance with embodiments of the present disclosure.

As shown in FIG. 12B, the value of x₄at time step t(i.e., x₄value 1220) depends on the values of x₁at time t−3 (i.e., x₁value 1213) and t−5 (i.e., x₁value 1215), the values of x₂at time t−1 (i.e., x₂value 1221) and t−2 (i.e., x₂value 1222), and the value of x₃at time t−4 (i.e., x₁value 1234). Similarly, the value of x₅at time step t(i.e., x₅value 1230) depends on the value of x₁at time t−1 (i.e., x₁value 1211), the value of x₂at time t−4 (i.e., x₂value 1224), and the values of x₃at time t−2 (i.e., x₃value 1232) and t−3 (i.e., x₃value 1233).

Advantageously, motif causality time series data generation 1210 only uses the previous lags that have a causal impact, finds relationships across motifs from different time series and allows DP-GAN 500 to learn the relationships (patterns) amongst the sequences of important events in the traces that contribute to time series construction.

For long time series, this is particularly advantageous because networks can easily be overwhelmed when trained to learn from every previous time step. Instead, by only conserving relationships related to sequences of important motifs, DP-GAN 500 learns to output realistic sequences of time steps in the traces more quickly.

For example, for glucose traces, to predict the next glucose value at time t, a large peak in glucose (e.g., a hyperglycemic incident) 6 time steps or more in the past (e.g., earlier than t−6) is more informative than the immediate past 5 time steps (e.g., t−1, t−2, t−3, t−4, t−5). This is due to the strong effect of the event (e.g., we know that the glucose values must come back down from the peak, regardless of if the previous glucose values are 330→329 or 290→289). As a result, we can take advantage of the patterns amongst these types of events (for example, if we see a large peak motif, we know a decreasing slope motif will show up after it).

FIG. 13 depicts a comparison 1300 of longitudinal time series data and synthetic time series data, in accordance with embodiments of the present disclosure.

Longitudinal time series data 1310 includes measured glucose values (mg/dL) for 288 time steps. Synthetic time series data 1320 includes synthetic glucose values (mg/dL) for 288 time steps generated by DP-GAN 500. As we can see from the samples of the synthetic traces, patterns in the traces look very realistic (e.g., have very similar overall structures to the real traces in terms of sequences of peaks, troughs, etc.).

FIG. 14 depicts a flow chart 1400 representing functionality associated with generating synthetic data, in accordance with embodiments of the present disclosure.

At 1410, longitudinal time series data are received. In many embodiments, the longitudinal time series data are unlabeled and univariate.

As described above, the longitudinal time series data may be medical longitudinal time series data. Generally, a patient's medical longitudinal time series data are bounded and include at least 50 time steps. Certain medical longitudinal time series data may include 100 or more time steps, such as, for example, 288 time steps for 24 hours of continuous glucose monitoring (CGM) data (i.e., 12 measurements/hour).

At 1420, a neural network is trained, based on the longitudinal time series data, to generate synthetic time series data that satisfies a privacy metric.

In many embodiments, the neural network may be a DP-GAN, such as, for example, DP-GAN 500. Training DP-GAN 500 is described above with reference to FIGS. 6 to 11C.

Additionally, as discussed above, aggregated motif causality matrix 513 (M_A) that preserves privacy is provided to generator module 530 during its training to focus generator module 530 on retaining the important motifs (events) within the traces of embedded original data x_e. Noise may also be added the weights of embedder module 522, recovery module 524, generator module 530 and discriminator module 540 to ensure each network upholds differential privacy.

The many features and advantages of the disclosure are apparent from the detailed specification, and, thus, it is intended by the appended claims to cover all such features and advantages of the disclosure which fall within the scope of the disclosure. Further, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the disclosure to the exact construction and operation illustrated and described, and, accordingly, all suitable modifications and equivalents may be resorted to that fall within the scope of the disclosure.

PRIVATE SYNTHETIC TIME SERIES DATA GENERATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional Applications (1)