END-TO-END LEARNING IN COMMUNICATION SYSTEMS

Description

FIELD

The present specification relates to learning in communication systems.

BACKGROUND

A simple communications system includes a transmitter, a transmission channel, and a receiver. The design of such communications systems may involve the separate design and optimisation of each part of the system. An alternative approach is to consider the entire communication system as a single system and to seek to optimise the entire system. Although some attempts have been made in the prior art, there remains scope for further developments in this area.

SUMMARY

In a first aspect, this specification describes an apparatus comprising: means for obtaining or generating a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights (the transmitter algorithm may be implemented as a differentiable parametric function and the receiver algorithm may be implemented as a differentiable parametric function); means for transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system (wherein the perturbations may be zero-mean Gaussian perturbations); means for receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; and means for training at least some weights of the transmitter algorithm based on the first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.

The means for training the at least some weights of the transmitter algorithm may make use of a distribution to generate the perturbations applied to the transmitter-training sequence of messages.

The first loss function data may be related to one or more of block error rate, bit error rate and categorical cross-entropy.

The apparatus may further comprises means for repeating the training of the at least some weights of the transmitter algorithm until a first condition is reached. The first condition may, for example, be a defined number of iterations and/or a defined performance level.

The means for training may further comprise optimising one or more of a batch size of the transmitter-training sequence of messages, a learning rate, and a distribution of the perturbations applied to the perturbed versions of the transmitter-training sequence of messages.

The apparatus may further comprise: means for obtaining or generating a receiver-training sequence of messages for transmission over the transmission system; means for transmitting the receiver-training sequence of messages over the transmission system; means for generating or obtaining second receiver loss function data, the second receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; and means for training at least some weights of the receiver algorithm based on the second receiver loss function data. The second loss function may, for example, be related to one or more of block error rate, bit error rate and categorical cross-entropy

Some forms of the invention may further comprise means for repeating the training of the at least some weights of the receiver algorithm until a second condition is reached. The second condition may, for example, be a defined number of iterations and/or a defined performance level.

Some forms of the invention may further comprise means for repeating both the training of the at least some weights of the transmitter algorithm and repeating the training of the at least some weights of the transmitter algorithm until a third condition is reached.

In some forms of the invention, at least some weights of the transmit and receive algorithms may be trained using stochastic gradient descent.

In some forms of the invention, the apparatus may further comprise means for repeating the training of the at least some weights of the transmitter algorithm until a first condition is reached and means for repeating the training of the at least some weights of the receiver algorithm until a second condition is reached.

In some forms of the invention, the transmitter algorithm may comprise a transmitter neural network and/or the receiver algorithm may comprise a receiver neural network.

In a second aspect, this specification describes an apparatus comprising: means for obtaining or generating a receiver-training sequence of messages for transmission over a transmission system, wherein the transmitter includes a transmitter algorithm (e.g. a transmitter neural network) having at least some trainable weights and the receiver includes a receiver algorithm (e.g. a receiver neural network) having at least some trainable weights; means for transmitting the receiver-training sequence of messages over the transmission system; means for generating or obtaining second receiver loss function data, the second receiver loss function data being generated based on a receiver-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; means for training at least some weights of the receiver algorithm based on the second receiver loss function data; means for obtaining or generating a transmitter-training sequence of messages for the transmission system; means for transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system; means for receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; and means for training at least some weights of the transmitter algorithm based on the first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.

The apparatus of the second aspect may further comprise means for repeating the training of the at least some weights of the transmitter algorithm until a first condition is reached and means for repeating the training of the at least some weights of the receiver algorithm until a second condition is reached. Furthermore, the apparatus may further comprise means for repeating both the training of the at least some weights of the transmitter algorithm and repeating the training of the at least some weights of the transmitter algorithm until a third condition is reached.

In at least some forms of the invention, the means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the performance of the apparatus.

In a third aspect, this specification describes a method comprising: obtaining or generating a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights (the transmitter algorithm may be implemented as a differentiable parametric function and the receiver algorithm may be implemented as a differentiable parametric function); transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system; receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; and training at least some weights of the transmitter algorithm based on first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.

The method may further comprise: obtaining or generating a receiver-training sequence of messages for transmission over the transmission system; transmitting the receiver-training sequence of messages over the transmission system; generating or obtaining a second receiver loss function data, the second receiver loss function data being generated based on received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; and training at least some weights of the receiver algorithm based on the second receiver loss function data.

In a fourth aspect, this specification describes a method comprising: obtaining or generating a receiver-training sequence of messages for transmission over a transmission system, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights; transmitting the receiver-training sequence of messages over the transmission system; generating or obtaining second receiver loss function data, the second receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; training at least some weights of the receiver algorithm based on the second loss function; obtaining or generating a transmitter-training sequence of messages for transmission over the transmission system; transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system; receiver first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; and training at least some weights of the transmitter algorithm based on the first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.

In a fifth aspect, this specification describes an apparatus configured to perform any method as described with reference to the third or fourth aspect.

In a sixth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the first aspect.

In a seventh aspect, this specification describes a computer program comprising instructions stored thereon for performing at least the following: obtaining or generating a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm and the receiver includes a receiver algorithm; transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system; receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; and training at least some weights of the transmitter algorithm based on first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages. The computer program may further comprise instructions stored thereon for performing at least the following: obtaining or generating a receiver-training sequence of messages for transmission over the transmission system; transmitting the receiver-training sequence of messages over the transmission system; generating or obtaining a second receiver loss function data, the second receiver loss function data being generated based on received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; and training at least some weights of the receiver algorithm based on the second receiver loss function data.

In an eighth aspect, this specification describes a non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following: obtaining or generating a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm and the receiver includes a receiver algorithm; transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system; receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; and training at least some weights of the transmitter algorithm based on first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages. The non-transitory computer-readable medium may further comprise program instructions stored thereon for performing at least the following: obtaining or generating a receiver-training sequence of messages for transmission over the transmission system; transmitting the receiver-training sequence of messages over the transmission system; generating or obtaining a second receiver loss function data, the second receiver loss function data being generated based on received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; and training at least some weights of the receiver algorithm based on the second receiver loss function data.

In a ninth aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: obtain or generate a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm and the receiver includes a receiver algorithm; transmit perturbed versions of the transmitter-training sequence of messages over the transmission system; receive first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; and train at least some weights of the transmitter algorithm based on first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages. The computer code may further cause the apparatus to: obtain or generate a receiver-training sequence of messages for transmission over the transmission system; transmit the receiver-training sequence of messages over the transmission system; generate or obtain a second receiver loss function data, the second receiver loss function data being generated based on received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; and train at least some weights of the receiver algorithm based on the second receiver loss function data.

BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments will now be described, by way of non-limiting examples, with reference to the following schematic drawings, in which:

FIG. 1 is a block diagram of an exemplary end-to-end communication system;

FIG. 2 is a block diagram of an exemplary transmitter used in an exemplary implementation of the system of FIG. 1;

FIG. 3 is a block diagram of an exemplary receiver used in an exemplary implementation of the system of FIG. 1;

FIG. 4 is a flow chart showing an algorithm in accordance with an exemplary embodiment;

FIG. 5 is a flow chart showing an algorithm in accordance with an exemplary embodiment;

FIG. 6 is a block diagram of an exemplary end-to-end communication system in accordance with an example embodiment;

FIG. 7 is a flow chart showing an algorithm in accordance with an exemplary embodiment;

FIG. 8 is a block diagram of an exemplary end-to-end communication system in accordance with an example embodiment;

FIG. 9 is a block diagram of a components of a system in accordance with an exemplary embodiment; and

FIGS. 10a and 10b show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of an exemplary communication system, indicated generally by the reference numeral 1, in which exemplary embodiments may be implemented. The system 1 includes a transmitter 2, a channel 4 and a receiver 6. Viewed at a system level, the system 1 converts an input symbol (s) (also called a message) received at the input to the transmitter 2 into an output symbol (ŝ) at the output of the receiver 6.

The transmitter 2 includes a module 10 (such as a neural network) for implementing a transmitter algorithm. Similarly, the receiver 6 includes a module 14 (such as a neural network) for implementing a receiver algorithm. As described in detail below, the modules 10 and 14 are trained in order to optimise the performance of the system as a whole.

The transmitter algorithm implemented by the module 10 may be implemented as a differentiable parametric function and may include at least some trainable weights (which may be trainable through stochastic gradient descent). Similarly, the receiver algorithm implemented by the module 14 may be implemented as a differentiable parametric function and may include at least some trainable weights (which may be trainable through stochastic gradient descent).

The transmitter 2 seeks to communicate one out of M possible messages s ∈ custom-character ={1, 2, . . ., M}to the receiver 6. To this end, the transmitter 2 sends a complex-valued vector representation x=x(s) ∈ⁿof the message through the channel 4. Generally, the transmitter hardware imposes constraints on x, e.g., an energy constraint ∥x∥₂²≤n, an amplitude constraint |x_i|≤1∀i, or an average power constraint custom-character [|x_i|²]≤1∀i. The channel is described by the conditional probability density function (pdf)p(y|x), where y ∈ⁿdenotes the received signal. Upon reception of y, the receiver produces the estimate ŝ of the transmitted message s.

FIG. 2 is a block diagram showing details of an exemplary implementation of the transmitter 2 described above. As shown in FIG. 2, the transmitter 2 includes an embedding module 22, a dense layer of one or more units 24 (e.g. one or more neural networks), a complex vector generator 26 and a normalization module 28. The modules within the transmitter 2 are provided by way of example and modifications are possible. For example, the complex vector generator 26 and the normalization module 28 could be provided in a different order.

The message index s is fed into the embedding module 22, embedding: custom-character ⁿ^emb, that transforms s into an n_emb-dimensional real-valued vector.

The embedding module 22 can optionally be followed by several dense neural network (NN) layers 24 with possible different activation functions (such as ReLU, tanh, signmoid, linear etc.). The final layer of the neural network has 2n output dimensions and a linear activation function. If no dense layer is used, n_emb=2n.

The output of the dense layers 24 is converted to a complex-valued vector (by complex vector generator 26) through the mapping custom-character 2: ²ⁿⁿ, which could be implemented as 2(z)=z₀ⁿ⁻¹+jz_n²ⁿ⁻¹.

A normalization is applied by the normalization module 28 that ensures that power, amplitude or other constraints are met. The result of the normalization process is the transmit vector x of the transmitter 2 (where x ∈ custom-character ⁿ). As noted above, the order of the complex vector generation and the normalization could be reversed.

The transmitter 2 defines the following mapping: TX: custom-character ⁿ, ={0, . . ., M−1}. In other words, TX maps an integer from the set to an n-dimensional complex-valued vector. We explain in FIG. 2 how this mapping can be implemented an NN. Other NN architectures are possible and this illustration serves just as an example.

FIG. 3 is a block diagram showing details of an exemplary implementation of the receiver 6 described above. As shown in FIG. 3, the receiver 6 includes a real vector generator 32, one or more layers 34 (e.g. one or more neural networks) and a softmax module 36. As described further below, the output of the softmax module is a probability vector that is provided to the input of an arg max module 38. The modules within the receiver 6 are provided by way of example and modifications are possible.

The received vector y ∈ custom-character ⁿis transformed (by real vector generator 32) into a real-valued vector of 2n dimensions through the mapping 2: ⁿ²ⁿ, which could be implemented as 2(z)=[{z}^T,{z}^T]T.

The result is fed into the one or more layers 34, which layers may have different activation functions such as ReLU, tanh, sigmoid, linear, etc. The last layer has M output dimensions to which a softmax activation is applied (by softmax module 36). This generates the probability vector p ∈ custom-character ^M, whose ith element [p], can be interpreted as Pr(s=i|y). A hard decision for the message index is obtained as ŝ=arg max(p) by arg max module 38.

The receiver 6 defines the mapping:

$RX : ℂ^{n} \mapsto {p \in ℝ_{+}^{M} | \sum_{i = 1}^{M} p_{i} = 1} \times ℳ$

In other words, the receiver 6 maps an n-dimensional complex-valued vector to an M-dimensional probability vector and an integer from the set custom-character . The example above describes how this may be implemented using a neural network architecture, although other architectures are possible. For example, the number of dimensions y can be different from n in case the channel provides a different number of relevant outputs.

FIG. 4 is a flow chart showing an algorithm, indicated generally by the reference numeral 40, in accordance with an exemplary embodiment.

The algorithm 40 starts at operation 42, where the transmitter 2 and the receiver 6 of the transmission system 1 are initialised. Note that the algorithm 40 acts on the system 1, which system includes a real channel 4.

At operation 44 of the algorithm 40, the receiver 6 is trained. As discussed in detail below, the receiver 6 is trained based on known training data sent by the transmitter 2 using the channel 4. The trainable parameters of the receiver algorithm (e.g. the receiver layers 34, which may be implemented using neural networks) are optimised based on stochastic gradient descent (SGD), using the signals received via the channel and the corresponding known messages. The goal of the optimisation is to improve a chosen performance metric (or reward), such as block error rate (BLER), bit error rate (BER), categorical cross-entropy, etc.

At operation 46, it is determined whether the training of the receiver is complete. If so, the algorithm 40 moves to operation 48, otherwise the algorithm returns to operation 44 and further training of the receiver is conducted.

At operation 48, the transmitter is trained. In the operation 48, the transmitter 2 sends is a sequence of known messages to the receiver 6. However, the transmitter signals associated with each message are slightly perturbed, for example by adding random vectors taken from a known distribution. The receiver computer the chosen performance metric or reward (such as BLER, BER, categorical cross-entropy, as discussed above) for the received signals and feeds the metric or reward data back to the transmitter. Note that the receiver is not optimised at part of the operation 48.

The trainable parameters of the transmitter algorithm (e.g. the transmitter layers 24, which may be implemented using neural networks) are optimised based on stochastic gradient descent (SGD) by estimating the gradient of the metric or reward with respect to its trainable parameters using the knowledge of the transmitted messages and signals, as well as the known distribution of the random perturbations.

At operation 50, it is determined whether the training of the transmitter is complete. If so, the algorithm 40 moves to operation 52, otherwise the algorithm returns to operation 48 and further training of the transmitter is conducted.

At operation 52, it is determined whether or not the training for the transmission system is complete. If so, the algorithm terminates at operation 54, otherwise the algorithm returns to operation 44 so that both the receiver and transmitter training can be repeated.

Thus, the communication system 1 is trained using a two-step process. The two steps may, for example, be carried out iteratively until and desired performance level is obtained and/or until a predefined number of iterations have been completed. There are a number of alternative mechanisms for implementing the operations 46, 50 and/or 52, such as stopping when a loss function being used has not decreased for a given number of iterations or stopping when a metric such as block error rate (BLER) has reached a desired level.

FIG. 5 is a flow chart showing an algorithm, indicated generally by the reference numeral 60, in accordance with an exemplary embodiment. The algorithm 60 provides further detail regarding the receiver training operation 44 of the algorithm 40 described above.

FIG. 6 is a block diagram of an exemplary end-to-end communication system, indicated generally by the reference numeral 70, in accordance with an example embodiment. The system 70 includes the transmitter 2, channel 4 and receiver 6 described above with reference to FIG. 1. The system 70 demonstrates aspects of the algorithm 60.

The algorithm 60 starts at operation 62, where the following steps are conducted:

1. Draw a random set of N_Rmessages S_R={s_R,i,i=1 . . ., N_R} uniformly from custom-character .

2. Compute the corresponding output vectors x_R,i=TX(s_R,i).

At operation 64, the channel 4 is used to transmit vectors from the transmitter 2 to the receiver 6 as follows:

3. Transmit the vectors x_R,iover the channel using the transmitter. The corresponding channel outputs at the receiver are called y_R,i, i=1 . . .,N_R.

At operation 66, a loss function is generated and stochastic gradient descent used for training the receiver as follows (and as indicated in FIG. 6):

4. Compute {PR,i, ŝR,i }=RX(y_R,i,) for i=1 . . .N_R.

5. Apply one step of SGD to the trainable parameters for weights) of the receiver NN, using the loss function

$\begin{matrix} L_{R} = \frac{1}{N_{R}} \sum_{i = 1}^{N_{R}} L_{R_{?} i_{?}} ? indicates text missing or illegible when filed & (3) \end{matrix}$

where L_R,i=−log ([pR,i]_s_R,i) is the categorical cross entropy between the input message and the output vector p_R,i.

It should be noted that the batch size N_Ras well as the learning rate (and possibly other parameters of the chosen SGD variant, e,g, ADAM, RMSProp, Momentum) could be optimization parameters of the training operation 44.

FIG. 7 is a flow chart showing an algorithm, indicated generally by the reference numeral 80, in accordance with an exemplary embodiment. The algorithm 80 provides further detail regarding the transmitter training operation 48 of the algorithm 40 described above.

FIG. 8 is a block diagram of an exemplary end-to-end communication system, indicated generally by the reference numeral 90, in accordance with an example embodiment. The system 90 includes the transmitter 2, channel 4 and receiver 6 described above with reference to FIG. 1. The system also includes a perturbation module 92 between the transmitter 2 and the channel 4. The system 90 demonstrates aspects of the algorithm 80.

The algorithm 80 starts at operation 82, where the following steps are conducted:

1. Draw a random set of N_Tmessages S_T={s_T,ii=1, . . ., N_T} uniformly from custom-character .

2. Compute. Fie corresponding output vectors=X_T,i=TX(s_T,i).

3. Draw N_Tpertubation vectors E_i∈ custom-character ⁿ,i=1, . . ., N_T, independently from and according to some distribution p(ε). For example, p(ε) could be the muitivariate complex Gaussian distribution (O,σ²I_n) with some small variance σ².

The perturbation vectors ε_iare added to the transmitter output using the perturbation module 92.

At operation 84, the channel 4 is used to transmit perturbed vectors as follows:

4. Transmit the perturbed vectors {circumflex over (x)}_T,i=X_T,i+ε_iover the channel using the transmitter. Denote p({circumflex over (x)}_T,i|x_T,i) the resulting conditional pdf of {circumflex over (x)}_T,ifor a given x_T,i. The corresponding channel outputs at the receiver are called y_T,i, i=1, . . ., N_T.

At operation 86, a loss function is generated and stochastic gradient descent used for training the transmitter as follows:

5. Compute {p_T,i}=RX(y_T,i) for i=1, . . ., N_T.

6. Compute the loss functions L_T,i, given as

$\begin{matrix} L_{T, i} = - \log ({[p_{T, i}]}_{s_{T, i}}) & (4) \end{matrix}$

where—log is ([pTi]s_T,) is the categorical cross entropy between the input message and the output vector p_T,i,

7. Apply one step of SGD to the trainable parameters for weights of the transmitter NN using the loss function

$\begin{matrix} L_{T} = \frac{1}{N_{R}} \sum_{i = 1}^{N_{T}} L_{T, i} \log p ({\overset{?}{x}}_{T, i} | s_{T, i}) . ? indicates text missing or illegible when filed & (5) \end{matrix}$

In the transmitter training, it should be noted that:

1. The loss function L_T,icould take other forms and does not necessarily need to be differentiable in contrast to the loss function used for receiver training in Section 1.4.

2. Note that the parameters of the chosen distribution p(ε) can be chosen to be trainable. E.g., if p(ε)= custom-character (μ, σ²I_n), then σ and μ can be made trainable parameters of the transmitter NN.

3. The batch-size N_Tas well as the learning-rate; (and possible other parameters of the chosen SGD variant (e.g., ADAM, RMSProp, Momentum)) are optimization parameters. The stop criterion in Step 8 can take multiple forms: stop after a fixed number of training iterations, stop when the loss function L_Thas not decreased during, a fixed number of iterations, stop when the loss or another associated metric such as the BLER

$(\frac{1}{N_{T}} \sum_{i} ({\hat{s}}_{T, i} = s_{T, i}))$

has reached a desired value. The criteria to repeat can be similar.

4. Note that (5) is simply a function of which the gradient w.r.t. to the trainable parameters θ of the transmitter is computed. The function ∇_θL_Tis also known as the policy gradient.

The training processes described herein encompass a number of variants. The use of reinforcement learning as described herein relies on exploring the policy space (i.e. the space of possible state to action mappings). As described herein, the policy is the mapping implemented by the transmitter, the state space is the source symbol alphabet and the action space is custom-character ⁿ. Exploring can be done in numerous ways, two of the most popular approaches being:

Gaussian policy, in which a perturbation vector ε is drawn from a multivariate zero-mean normal distribution and added to the current policy. This ensures exploration “in the neighbourhood” of the current policy.

ε-greedy, in which with probability 1-ε, the token action is the one of the policy, and with probability ε a random action is taken.

The covariance matrix of the normal distribution from which the perturbation vector ε is drawn in the Gaussian policy, and the ε parameter of the ε-greedy approach, are usually fixed parameters, i.e., not learned during training. These parameters control the “amount of exploration”, as making these parameters smaller reduces the amount of random exploration, and favours actions from the current policy.

In another embodiment of this invention, the goal is not communicate messages s ∈ custom-character but rather vectors s ∈^Nwhich are reconstructed by the receiver. For instance, s could be a digital image and the goal of the receiver is to reconstruct the vector s ∈^Nas good as possible. FIGS. 5 and 6 shown the necessary changes to transmitter and receiver, respectively, to implement this idea.

In this case the loss function L_T,iand L_R,iduring transmitter and receiver training can be given as

$\begin{matrix} L_{T, i} = \frac{1}{2} { s_{T, i} - {\tilde{s}}_{T, i} }_{2}^{2} & (6) \\ L_{R, i} = \frac{1}{2} { s_{R, i} - {\tilde{s}}_{R, i} }_{2}^{2} & (7) \\ (8) \end{matrix}$

When s_T,i, n s_R,icorrespond to the transmitted data vectors and ŝ_T,i, ŝ_R,iare the respective reconstructions. This loss is the so-called mean squared error (MSE).

It is also possible that the transmitter sends a data vector s ∈ custom-character ^N, but the goal of the receiver is to classify the transmitted vector into one out of M classes. For example, s could be an image and the receiver's goal is to tell whether s contains a dog or a cat. In this embodiment, the realization of the transmitter as in FIG. 5 could be used while the receiver is implemented as in FIG. 3. The loss functions for training would then be chosen as in Section 1.3 with the difference that each transmit vector s has an associated label l ∈ custom-character which is used to compute the loss, i.e.,

$\begin{matrix} L_{T, i} = - \log ({[p_{T, i}]}_{_{T, i}}) & (9) \\ L_{R, i} = - \log ({[p_{R, i}]}_{_{R, i}}) . & (10) \end{matrix}$

For completeness, FIG. 9 is a schematic diagram of components of one or more of the modules described previously (e.g. the transmitter or receiver neural networks), which hereafter are referred to generically as processing systems 110. A processing system 110 may have a processor 112, a memory 114 closely coupled to the processor and comprised of a RAM 124 and ROM 122, and, optionally, hardware keys 120 and a display 128. The processing system no may comprise one or more network interfaces 118 for connection to a network, e.g. a modem which may be wired or wireless.

The processor 112 is connected to each of the other components in order to control operation thereof.

The memory 114 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD). The ROM 122 of the memory 114 stores, amongst other things, an operating system 125 and may store software applications 126. The RAM 124 of the memory 114 is used by the processor 112 for the temporary storage of data. The operating system 125 may contain code which, when executed by the processor, implements aspects of the algorithms 40, 60 and 80.

The processor 112 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors.

The processing system no may be a standalone computer, a server, a console, or a network thereof.

In some embodiments, the processing system no may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The processing system no may be in communication with the remote server device in order to utilize the software application stored there.

FIGS. boa and bob show tangible media, respectively a removable memory unit 165 and a compact disc (CD) 168, storing computer-readable code which when run by a computer may perform methods according to embodiments described above. The removable memory unit 165 may be a memory stick, e.g. a USB memory stick, having internal memory 166 storing the computer-readable code. The memory 166 may be accessed by a computer system via a connector 167. The CD 168 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used.

Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc.

should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagram of FIGS. 4, 5 and 7 are examples only and that various operations depicted therein may be omitted, reordered and/or combined.

It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, is and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.

Claims

1. An apparatus comprising: at one processor; andat least one memory including computer program code;the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform,obtaining or generating a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights;transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system;receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; andtraining at least some weights of the transmitter algorithm based on the first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.
2. An apparatus as claimed in claim 1, wherein the training the at least some weights of the transmitter algorithm makes use of a distribution to generate the perturbations applied to the transmitter-training sequence of messages.
3. An apparatus as claimed in claim i wherein the perturbations are zero-mean Gaussian perturbations.
4. An apparatus as claimed in claim 1, wherein the first loss function data is related to one or more of block error rate, bit error rate and categorical cross-entropy.
5. An apparatus as claimed in claim 1, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform repeating the training of the at least some weights of the transmitter algorithm until a first condition is reached.
6. An apparatus as claimed in claim 5, wherein the first condition is a defined number of iterations and/or a defined performance level.
7. An apparatus as claimed in claim 1, wherein the training further comprises optimising one or more of a batch size of the transmitter-training sequence of messages, a learning rate, and a distribution of the perturbations applied to the perturbed versions of the transmitter-training sequence of messages.
8. An apparatus as claimed in claim 1, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform obtaining or generating a receiver-training sequence of messages for transmission over the transmission system; transmitting the receiver-training sequence of messages over the transmission system;generating or obtaining second receiver loss function data, the second receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; andtraining at least some weights of the receiver algorithm based on the second receiver loss function data.
9. An apparatus as claimed claim 8, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform repeating the training of the at least some weights of the receiver algorithm until a second condition is reached.
10. An apparatus as claimed in claim 9, wherein the second condition is a defined number of iterations and/or a defined performance level.
11. An apparatus as claimed in claim 8, wherein the second loss function is related to one or more of block error rate, bit error rate and categorical cross-entropy.
12. An apparatus as claimed in claim 8, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform repeating both the training of the at least some weights of the transmitter algorithm and repeating the training of the at least some weights of the transmitter algorithm until a third condition is reached.
13. An apparatus as claimed in claim 1, wherein said at least some weights of the transmit and receive algorithms are trained using stochastic gradient descent.
14. An apparatus comprising: at least one processor; andat least one memory including computer program code;the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform,obtaining or generating a receiver-training sequence of messages for transmission over a transmission system, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights;transmitting the receiver-training sequence of messages over the transmission system;generating or obtaining second receiver loss function data, the second receiver loss function data being generated based on a receiver-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence;training at least some weights of the receiver algorithm based on the second receiver loss function data;obtaining or generating a transmitter-training sequence of messages for the transmission system;transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system;receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; andtraining at least some weights of the transmitter algorithm based on the first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.
15. An apparatus as claimed in claim 14, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform repeating the training of the at least some weights of the transmitter algorithm until a first condition is reached and means for repeating the training of the at least some weights of the receiver algorithm until a second condition is reached.
16. An apparatus as claimed in claim 15, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform repeating both the training of the at least some weights of the transmitter algorithm and repeating the training of the at least some weights of the transmitter algorithm until a third condition is reached.
17. An apparatus as claimed in claim 14, wherein the transmitter algorithm comprises a transmitter neural network and/or the receiver algorithm comprises a receiver neural network.
18. A method comprising: obtaining or generating a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights;transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system;receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; andtraining at least some weights of the transmitter algorithm based on first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.
19. A method as claimed in claim 18, further comprising: obtaining or generating a receiver-training sequence of messages for transmission over the transmission system;transmitting the receiver-training sequence of messages over the transmission system;generating or obtaining a second receiver loss function data, the second receiver loss function data being generated based on received-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence; andtraining at least some weights of the receiver algorithm based on the second receiver loss function data.
20. A method comprising: obtaining or generating a receiver-training sequence of messages for transmission over a transmission system, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights;transmitting the receiver-training sequence of messages over the transmission system;generating or obtaining second receiver loss function data, the second receiver loss function data being generated based on a receiver-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence;training at least some weights of the receiver algorithm based on the second loss function;obtaining or generating a transmitter-training sequence of messages for transmission over the transmission system;transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system;receive first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; andtraining at least some weights of the transmitter algorithm based on the first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.
21. A non-transitory computer readable medium storing a computer program comprising instructions, which when executed by a processor, cause an apparatus including the processor to perform the following: obtaining or generating a transmitter-training sequence of messages for a transmission system, wherein the transmission system comprises a transmitter, a channel and a receiver, wherein the transmitter includes a transmitter algorithm and the receiver includes a receiver algorithm;transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system;receiving first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; andtraining at least some weights of the transmitter algorithm based on first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.
22. A non-transitory computer readable medium storing a computer program comprising instructions, which when executed by a processor, cause an apparatus including the processor to perform the following: obtaining or generating a receiver-training sequence of messages for transmission over a transmission system, wherein the transmitter includes a transmitter algorithm having at least some trainable weights and the receiver includes a receiver algorithm having at least some trainable weights;transmitting the receiver-training sequence of messages over the transmission system;generating or obtaining a second receiver loss function data, the second receiver loss function data being generated based on a receiver-training sequence as received at the receiver and knowledge of the transmitted receiver-training sequence;training at least some weights of the receiver algorithm based on the second loss function;obtaining or generating a transmitter-training sequence of messages for transmission over the transmission system;transmitting perturbed versions of the transmitter-training sequence of messages over the transmission system;receive first receiver loss function data at the transmitter, the first receiver loss function data being generated based on a received-training sequence as received at the receiver and knowledge of the transmitter training sequence of messages for the transmission system; andtraining at least some weights of the transmitter algorithm based on the first receiver loss function data and knowledge of the transmitter-training sequence of messages and the perturbed versions of the transmitter-training sequence of messages.

PCT Information

Filing Document	Filing Date	Country	Kind
PCT/IB2018/000814	4/3/2018	WO	00

END-TO-END LEARNING IN COMMUNICATION SYSTEMS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

PCT Information