This patent application relates generally to systems and methods for facilitating real-time communication, and more particularly, to determining properties of channel input distributions that characterize capacities of channels with memory, with feedback, with or without transmission cost, and without feedback (in certain cases in which feedback does not increase capacity), to designing encoders to achieve capacities of channels with memory, to characterizing nonanticipative rate distortion functions for sources of information with memory, and to utilize characterized rate distortion functions and channel capacities in Joint Source-Channel Coding (JSCC).
Advancements in information technology and networks are transforming the everyday lives of many people with respect to employment, health care, communication, education, environment, etc. In particular, advancements in information technology and networks have spawned the field of Cyber-Physical Systems (CPSs), which field refers to the next generation of engineering systems integrated via advanced technologies and protocols. These engineering systems are capable of performing ubiquitous computing, communication, and control for complex physical systems and can be implemented in energy systems (e.g., the electric power distribution and smart grids), transportation systems (e.g., traffic networks), health care and medical systems, surveillance networks, control systems for underwater and unmanned aerial vehicles, etc. In many of these applications, sub-systems, sensors or observation posts, and controllers or control stations are distributed, often at distinct locations, and communication among sub-systems is limited. Thus, in the context of such systems, there is a demand for real-time communication, decentralized decisions, and the integration of real-time communication and decentralized decisions into complex networks.
In the field of communications, most encoders and decoders for transmitting information over channels (e.g., for transmitting speech signals over wireless communication channels) are designed based on an assumption that the channels do not have memory. That is, most communication systems are configured based on theories, methods, expressions, etc. assuming that channels of the communication system have an conditional output probability distribution that depends only on a current input (i.e., the output of the channels is conditionally independent of previous channel inputs or outputs symbols and the source symbols). However, typical communication channels are not memoryless due to Inter-symbol interference (ISI), correlated channel noise, etc. As a result, most communication systems are configured with components (e.g., encoders) that do not operate optimally when transmitting information over channels (e.g., the components do not achieve the capacity of the channels with memory) and are, in many cases, overly complicated due to the lack of knowledge of capacity achieving properties of the encoders. A characterization of channel capacity and corresponding capacity achieving channel input distributions, which would allow for the design of capacity achieving encoders, is not known for most channels with memory.
Further, the field of communications has developed primarily based on the following model: a message is generated randomly by an information source, the message is encoded by an encoder, the message is transmitted as a signal over a noisy channel, and the transmitted signal is decoded to produce an output as an approximation of the message generated by the source. The fundamental problem of this model is to determine simultaneously what information should be transmitted (source coding) and how the information should be transmitted (channel coding) to achieve performance. Over the years, this fundamental problem has been separated into the two subproblems of source coding and channel coding. The first sub-problem, source coding, is related to efficient representation of information, such as information representing speech, so as to minimize information storage and to characterization of a minimum rate of compressing the information generated by the source of the information (e.g., via the classical Rate Distortion Function (RDF) of the source subject to a fidelity of reconstruction). The second sub-problem, channel coding or “error correction coding,” is related to a correction of errors arising from channel noise, such as flaws in an information/data storage or transmission system, loss of information packets in networks, failures of communications links, etc., and to the characterization of the maximum rate of information transmission, called “channel capacity.”
The general separation of the fundamental problem into source coding and channel coding sub-problems, has divided the community of developers into independent groups developing source codes and channel codes, respectively. Although, extremely useful in some contexts, this idealized separation is limiting future advances in communication technology, in that developers are ignoring practical design criteria, such as computational complexity, delay, and optimal performance. Further, the ideal separation of source coding and channel coding is often violated for point-to-point communications over channels with memory and for network communication systems. On the other hand, the optimal design of simultaneously performing data compression and channel coding is, in those known cases, elegantly simple. However, this optimal design is, in general, hard to find. For example, separation of source and channel coding leads to the design of channel codes which treat all information bits as equally important. However, a scenario in which all information bits are equally important (e.g., in achieving an optimal channel capacity) is rare, and, hence, a separation of source and channel coding can lead to performance degradation.
In an embodiment, a method for characterizing a capacity of a channel with memory and feedback comprises defining a channel model corresponding to the channel, wherein: the channel is utilized to transmit information from a source to a destination, and the channel model indicates a dependence of outputs from the channel on past and present channel input symbols and on past channel output symbols. The method further includes determining a representation of the capacity based on the channel model and based on a channel input distribution that achieves the capacity, wherein the representation represents the capacity for a finite number of transmissions over the channel, and wherein the representation includes an optimization, and solving, by one or more processors of a specially configured computing device, the optimization of the representation of the capacity to determine the capacity of the channel for the finite number of transmissions and a per unit time limit of the capacity of the channel.
In another embodiment, a system comprises one or more processors and one or more non-transitory memories. The one or more non-transitory memories store computer-readable instructions that specifically configure the system such that, when executed by the one or more processors, the computer-readable instructions cause the system to: receive a channel model corresponding to the channel, wherein: the channel is utilized to transmit information from a source to a destination, and the channel model indicates a dependence of outputs from the channel on past and present channel input symbols and on past channel output symbols. the computer-readable instructions further cause the system to: determine a representation of the capacity based on the channel model and based on a channel input distribution that achieves the capacity, wherein the representation represents the capacity for a finite number of transmissions over the channel, and wherein the representation includes an optimization, and solve the optimization of the representation of the capacity to determine the capacity of the channel for the finite number of transmissions.
The techniques of the present disclosure facilitate a characterization of capacity and capacity achieving channel input distributions for channels with memory, with or without feedback, and with transmission cost. Further, encoders of the present disclosure satisfy necessary and sufficient conditions to achieve the capacity of channels with memory, with or without feedback, and with transmission cost, and methods of the present disclosure include determining whether an optimal transmission, for a given model of channels and transmission cost, is indeed real-time transmission. Encoders of the present disclosure may also be configured that such that the encoders simultaneously compress and encode information for sources with memory and/or with zero-delay, by performing a JSCC design.
To this end, a two-step procedure determines the structural properties of (i) capacity achieving encoders, and (ii) capacity achieving channel input distributions, for general channels with memory and with or without feedback encoding and transmission cost. More specifically, the two-step procedure identifies “multi-letter,” “finite block length” feedback capacity expressions along with the corresponding capacity achieving channel input distributions and encoders. By extension, the procedure identifies per unit time limiting feedback capacity formulas along with the corresponding capacity achieving channel input distributions and encoders.
Further, necessary and sufficient conditions allow for the design of an encoder with or without feedback that achieves the capacity of a channel with memory. These encoders, referred to herein as “information lossless” encoders, define a mapping of information from a source, which mapping generates the information to the output of the encoder. The mapping may be invertible such that no information is lost in the encoding process. For each of a plurality a channel classes, the determined capacity achieving input distributions, mentioned above, allow specific mapping (e.g., encoders) to be defined. These specific mappings adhere to the necessary and sufficient conditions for capacity achieving encoders.
Nonanticipative rate distortion functions (RDFs) of the present disclosure may achieve a zero-delay compression of information from a source. These nonanticipative RDFs may represent an optimal (e.g., capacity achieving) compression that is zero-delay. For example, the nonanticipative RDFs of the present disclosure may represent a scheme for compressing information that only depends on previously communicated symbols (e.g., is causal), not on all communicated symbols. The nonanticipative RDFs may also represent compression of both time-varying, or “nonstationary,” and stationary sources of information.
Still further, methods for designing encoding schemes may utilize Joint Source Channel Coding (JSCC) to generate encoding schemes for simultaneous compression and channel coding operating with zero-delay or in real-time. As opposed to having codes for compression and codes for channel coding of compressed information, encoding schemes of the present disclosure may utilize a single encoding scheme, generated via JSCC design, that provides both compression and channel coding with zero-delay. JSCC methods to design such encoding scheme may utilize characterized capacities, necessary and sufficient conditions of information lossless encoders, and nonanticipative RDFs as discussed above.
The source 102 may include one or more stationary or mobile computing or communication devices, in an implementation. For example, the source 102 may include a laptop, desktop, tablet, or other suitable computer generating messages including digital data, such as digital data representing pictures, text, audio, videos, etc. In other examples, the source 102 may be a mobile phone, smartphone, land line phone, or other suitable phone generating messages including signals representative of audio or text messages to one or more other suitable phones or communication devices. Generally, the source 102 may include any number of devices or components of devices generating messages to be encoded by the encoder 104 and transmitted over the channel 106. Further details of an example computing device, which may be implemented as the source 102, are discussed with reference to
In particular, the example source 102 generates messages including source symbols xn{x0, x1, . . . , xn}, xjεXj, where j=0, 1, . . . , n, according to a source distribution PX
The encoder 104 may include one or more devices, circuits, modules, engines, and/or routines communicatively and/or operatively coupled to the source 102. For example, the encoder 104 may be communicatively and/or operatively coupled to the source 102 via a bus of a computing device (e.g., a computing device implemented as the source 102), such as an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, a Peripheral Component Interconnect (PCI) bus or a Mezzanine bus, or the Peripheral Component Interconnect Express (PCI-E) bus, or the encoder 104 may be communicatively and/or operatively coupled to the source 102 via one or more wires or cables, such as Ethernet cables, ribbon cables, coaxial cables, etc. In any event, the encoder 104 receives messages (e.g., source symbols xn) from the source 102.
The example encoder 104 encodes the received symbols xn from the source 102 into channel input symbols an{a0, a1, . . . , an}, ajεXj, where j=0, 1, . . . , n. These channel input symbols have “induced” channel input distributions {PA
The channel 106 may include any number of wires, cables, wireless transceivers, etc., such as Ethernet cables, configured to facilitate a transmission of the encoded source symbols from the encoder 104 to the decoder 108. Further details of various types of channels that may be implemented as the channel 106 are discussed in section B.2. entitled “Characterizing Channels.” The example channel 106 is noisy with memory defined by a sequence of conditional distributions, {PB
The decoder 108 may include one or more devices, modules, engines, and/or routines communicatively and/or operatively coupled to one or more computing devices, phones, etc. to receive decoded information transmitted over the channel 106. The decoder 108 receives the channel output, Bj, and decodes the channel output to produce the decoded output Yj. The decoder 108 may or may not use past channel outputs and decoder outputs to produce Yj. For example, when the source 102 is a originating phone, the decoder 108 may be communicatively and/or operatively coupled to terminating phone to decode audio signals sent from the originating phone to the terminating phone.
In some implementations, the source 102 and/or encoder 104 may know (e.g., receive transmissions indicative of and/or store information indicative of) all, or at least some outputs (Bj) from the channel 106, before generating, encoding, and/or transmitting a next signal. This functionality may be referred to herein as “feedback.” In other words, a channel or communication system with feedback may be a system in which a source and/or encoder receive indications of, or otherwise “know,” all or at least some previous outputs from the channel before sending a subsequent signal over the channel.
The methods described below allow the “finite block length” feedback capacity for channels, such as the channel 106, with memory and with and without transmission cost constraints on the encoders to be characterized, in some implementations. “Feedback capacity” may refer to a capacity of a channel with feedback, as discussed above, and “finite block length” may refer to a feedback capacity defined in terms of a finite number of transmissions or a finite time period such that the feedback capacity can be defined without taking a limit of infinite transmissions. In characterizing the finite block length feedback capacity, operators of communications systems may also identify the corresponding capacity achieving channel input distributions and the capacity achieving encoders. Operators may also utilize the characterization to determine whether feedback increases the finite block length capacity of channels without feedback, to characterize the finite block length capacity without feedback for general channels with memory and with or without transmission cost constraints on the encoders, and to identify the corresponding capacity achieving channel input distributions and capacity achieving encoders. By extension, the methods described below may allow a per unit time limiting version of the finite block length feedback capacity to determine the capacity and capacity achieving channel input distributions.
In some implementations, the characterization of the finite block length feedback capacity may facilitate the design of optimal encoding and decoding schemes that both reduce the complexity of communication systems and operate optimally. Operating optimally may include an optimal operation in terms of the overall number of processing elements (e.g., CPUs) required to process transmissions and/or the number of memory elements and steps required to encoder and decode messages. Such optimal encoding and decoding schemes may require small processing delays and short code lengths in comparison to encoding and decoding schemes designed based on an assumption of a channel without memory or based on a separate treatment of source codes and channel codes.
Architectures and methodologies discussed herein provide encoders and decoders based on channel characteristics, transmission cost requirements, and the characteristics of messages generated by a source (e.g., the source 102). Although, the characterizations, encoders, distributions, etc. discussed below are described, by way of example, with reference to the example communication system 100, which system 100 is a point-to-point communication system, characterizations, encoders, distributions, etc. may be applied to or implemented in systems other than point-to-point communication systems. For example, the characterizations, encoders, distributions, etc. of the present disclosure may be implemented in multi-user and network communication systems by repeating the procedures described below (for each user, for each node of a network, etc.).
A determination and characterization of the capacity of a channel, such as the channel 106, may first include a characterization or description of the channel. The techniques of the present disclosure characterize a plurality of channel types with memory and with or without feedback allowing the capacity of these channels to be determined along with capacity achieving encoders and distributions, in some implementations. Generally, the characterization of a channel includes: (i) an “alphabet” defining the signals spaces (e.g., inputs and outputs) of a communication system; (ii) a “transmission cost function” defining a dependence of a rate of information transfer over the channel on the amount of energy or, more generally, any cost imposed on symbols transferred over the channel; and (iii) a model for the channel described by, for example, conditional distributions or stochastic discrete-time recursions in linear, nonlinear and state space form.
Channel input and output alphabets may, in some implementations, be complete separable metric spaces such as functions spaces of finite energy signals. These metric spaces may include, by way of example and without limitation, continuous alphabets, countable alphabets, and finite alphabets, such as real-valued Rp-dimensional and/or complex-valued Cp-dimensional alphabets for channel output alphabets and real-valued Rp-dimensional and/or complex-valued Cq-dimensional alphabets for channel input alphabets, finite energy or power signals, and bounded energy signals defined on metric spaces.
Transmission cost functions may include, by way of example, nonlinear functions of past and present channel inputs and past channel outputs or conditional distributions. The transmission cost functions define a cost of transmitting certain symbols over a channel, which cost is generally not the same for all symbols. For example, an energy, or other cost, required to transmit one symbol may differ from an energy required to transmit another symbol. Operators of a communication system may determine transmission cost functions, to utilize in the methods discussed below, by sending certain information (e.g., diagnostic or configuration information) over a channel and measuring an output of the channel.
Similarly, by sending certain information over a channel, operators of a communication system may determine a channel model. The channel model may model the behavior of the channel including a dependence (e.g., non-linear dependence) of transmission on past channel inputs, outputs and noise processes of memory. By way of example, the channel model may be a linear channel models with arbitrary memory, a Gaussian channel model, a state space model, or an arbitrary conditional distribution defined on countable, finite channel input and output alphabets, or continuous alphabet spaces. For waveform signals (e.g., continuous speech signals) transferred over a channel, the channel model may be a non-linear differential equation, and, for quantized signals (e.g., zeros and ones) transferred over a channel, the channel model may be a conditional distribution.
Models of channels with memory and feedback may include, by way of example, the following channel conditional distributions defining certain classes of channels:
P
B
|B
,A
(dbi|bi−1,ai)=PB
P
B
|B
,A
(dbi|bi−1,ai)=PB
P
B
|B
,A
(dbi|bi−1,ai)=PB
P
B
|B
,A
(dbi|bi−1,ai)=PB
P
B
|B
,A
(dbi|bi−1,ai)=PB
P
B
|B
,A
(dbi|bi−1,ai)=PB
P
B
|B
,A
(dbi|bi−1,ai)=PB
where {L,M} are nonnegative finite integers. The above example classes of channel conditional distributions may be induced by nonlinear channel models and linear time-varying Autoregressive models or by linear and nonlinear channel models expressed in state space form.
Classes of transmission cost functions may include, by way of example, the following:
γi(Tnai,Tnbi−1)=γiA.1(ai,bi−1), i=0, . . . n, 1.
γi(Tnai,Tnbi−1)=γiA.2(ai−Ni,bi−1), i=0, . . . n, 2.
γi(Tnai,Tnbi−1)=γiB.1(ai,bi−1), i=0, . . . n, 1.
γi(Tnai,Tnbi−1)=γiB.2(ai,bi−Ki−1), i=0, . . . n, 2.
γi(Tnai,Tnbi−1)=γiB.3(ai,bi−Ki−1), i=0, . . . n, 3.
γi(Tnai,Tnbi−1)=γiC.1(ai−1i,bi−2i−1), i=0, . . . n, 1.
γi(Tnai,Tnbi−1)=γiC.2(ai−1i,bi−Ki−1), i=0, . . . n, 2.
where {N,K} are nonnegative finite integers.
If M=0 in the above example classes of channels and transmission cost functions, then
P
B
|B
,Ā
(dbi|bi−Mi−1,āi)|M=0≡PB
P
B
|
,A
(dbi|bi−Mi−1,ai−Li)|M=L=0≡PB
is a memoryless channel. Similarly, if K=0 then
γiC.2(ai−Ni,bi−Ki−1)|K=0≡γiC.2(ai−Ni), i=0, . . . ,n.
For further clarification, varous notations and expressions related to capacities without feedback are presented below. The problem of the capacity of channels with memory and without feedback includes a maximization of mutual information between channel input and output sequences over an admissible set of channel input distributions. The mutual information may be:
The admissible set of channel input distributions may be:
[0,n]
noFB
{P
A
|A
(dai|ai−1):i=0,1, . . . ,n}
and the maximization of mutual information may be expressed as:
where PA
Further, the extremum problems of capacity of channels with memory without feedback, when transmission cost is imposed on channel input distributions may be written, by way of example, as:
In this notation, CA
B.3. Capacity with Feedback
For still further clarification, various notations and expressions related to capacities with feedback are presented below. The problem of the capacity of channels with memory and with feedback includes a maximization of directed information from channel input and output sequences over an admissible set of channel input distributions. The directed information may be written as:
and the admissible set of channel input distributions may be expressed as:
[0,n]
{P
A
|A
,B
(dai|ai−1,bi−1):i=0,1, . . . ,n}
The maximization of directed information may be expressed as:
For each i=0, 1, . . . , n, {PA
Further, the extremum problems of capacity of channels with memory and with feedback, when transmission cost is imposed on channel input distributions may be written, by way of example, as:
In this notation, CA
In the process 200, a channel model 202 and a transmission cost function 204 are input into a two-step characterization procedure 206. The channel model 202 and transmission cost function 204 may describe one of the plurality of channel classes and transmission cost function classes discussed in section B.1. entitled “Characterizing Channels.” In particular, the channel model 202 may describe the noise of a channel and dependencies amongst channel inputs and channel output via one or more conditional distributions and/or one or more non-linear differential equations. The transmission cost function 204 may define a cost of sending each of a plurality of symbols (e.g., defined in an alphabet) over the channel described by the channel model 202.
The two-step characterization procedure 206 may, based on the channel model 202 and a transmission cost function 204 produce an optimal channel input distribution 208 and a finite block length feedback capacity 210, in the scenario illustrated in
In some implementations, the process 200 also includes a per unit time conversion 212 of the optimal channel input distribution 208 and the finite block length feedback capacity 210. This per unit time conversion 212 converts the finite block length feedback capacity 210 to a feedback capacity 213, which feedback capacity 214 describes the capacity of the channel as a number of transmissions approaches infinity. The per unit time conversion 212 additionally generates a feedback capacity achieving input distribution 216 corresponding to the feedback capacity 214. Although the feedback capacity 214 is illustrated in
As discussed with reference to
Given a specific channel distribution and a specific transmission cost function from the classes described in section B.1. entitled “Characterizing Channels,” step one of the two-step characterization procedure 206 may include applying stochastic optimal control to show a certain joint process which generates the information structure of the channel input distribution at each time is an extended Markov process. For an example, for the channel input distribution (at every time) describing a channel with memory:
PA
the joint process
i
P
⊂{a
i−1
,b
i−1}, for i=0,1, . . . ,n,
is an extended Markov process, with respect to a smaller information structure
i
⊂{a
i−1
,b
i−1},
for i=0, 1, . . . , n. Based on this joint process, the optimal channel input distribution corresponding to CA
[0,n]
⊂{P
i(dai|ai−1,bi−1): i=0, . . . ,n}ε[0,n] and [0,n]()⊂[0,n]().
Thus, step one of the two-step characterization procedure 206 narrows all possible channel input distributions to specific subsets of input distributions, where the subsets of input distributions include the optimal channel input distribution.
Further, in some implementations of the step one of the two-step characterization procedure 206, the first step may include a characterization of capacity (e.g., finite block length feedback capacity) corresponding to the determined subsets of input distributions. In particular, step one of the two-step characterization procedure 206 may include generating a formula or other expression representing a capacity corresponding to the determined subsets of input distributions. For the channel input distribution describing a channel with memory (discussed above), the formulas for finite block length feedback capacity and feedback capacity with and without transmission cost are:
where I(An→Bn)=I(
Step two of the two-step characterization procedure 206 includes applying a variational equality of directed information to the subsets of input distributions determined in step one. In this manner, step two of the two-step characterization procedure 206 further narrows the determined subsets of input distributions, in some implementations. In particular, for the example above involving a channel with memory, an upper bound is achievable over the determined subsets of input distributions is expressed as:
[0,n]⊂[0,n] and [0,n]()⊂[0,n]().
Based on such an upper bound or further narrowing of input distributions, step two may further include determining a refined capacity based on the narrowed input distributions. For the example above of a channel with memory, the characterization of finite block length feedback capacity and feedback capacity with and without transmission cost, obtained from step two is:
where I(An→Bn)=I({dot over (π)},P) is a specific functional of the channel input distribution {dot over (π)}ε[0,n] and the channel conditional distribution Pε{Class A, Class B, Class C} as described in section B.1. entitled “Characterizing Channels.”
In the method 300, a computing device or operator receives a channel model and transmission cost function (block 302). The received channel model may include a channel conditional distribution, such as one of the channel conditional distributions Pε{Class A, Class B, Class C} as described in section B.1. entitled “Characterizing Channels,” or the channel model may include other types of functions, such as non-linear differential equations, for channels transmitting continuous signals (e.g., speech signals). Also, the transmission cost function may include one of the transmission cost functions as described in section B.1. entitled “Characterizing Channels.”
The computing device and/or operator then applies step one of a two-step procedure (block 304). In step one, the computing device and/or operator utilizes stochastic optimal control to determine subsets of input distributions and a corresponding capacity formula based on the received channel model and transmission cost function. The determine subsets of input distributions may include the optimal channel input distributions. That is the determined subsets of input distributions may include the channel input distribution that achieves the corresponding capacity of the channel. Further details of step one of the two-step procedure are discussed with reference to
For certain channel models and transmission cost functions, step one may be sufficient to identify the information structures of channel input distributions and characterize a capacity (e.g., the finite block length feedback capacity and the feedback capacity with and without transmission cost) of the channel. In other cases, step one may only be an intermediate step before step two of the two-step procedure. More specifically, step one may be sufficient for channel conditional distributions which only depend on all previous channel outputs and costs (e.g., not a current and/or previous channel input). As such, the method 300 includes determining if the channel, described by the received channel model, is dependent on more than the previous channel outputs and costs (block 306). If the channel is only dependent on all previous channel outputs and costs, the flow continues to block 310. If the channel is dependent on more than the previous channel outputs and costs (e.g., channel inputs), the flow continues to block 308.
At block 308, the computing device and/or operator applies step two of the two-step procedure. In step two, the computing device and/or operator utilizes a variational equality to further narrow the subsets of input distributions determined at block 304. Further, the computing device and/or operator determines a refined capacity formula, such as a finite block length feedback capacity, based on the further narrowed subsets of input distributions. In some implementations, the further narrowing of the input distributions based on the variational equality includes identifying a single optimal channel input distribution, and, in other implementations, the further narrowing of the input distributions based on the variational equality includes identifying a smaller subset of channel input distribution including fewer distributions than the subsets determined at block 304.
The computing device and/or operator solves the capacity formula (determined at block 304) and/or refined capacity formula (determined at block 308) to determine the capacity of the channel (block 310). The capacity formulas solved at block 308 may include maximizations or other extremum problems, as further discussed above. The computing device and/or operator may utilize various techniques to solve these maximizations or extremum problems including, by way of example and without limitation, dynamic programming and/or stochastic calculus of variations (e.g., the stochastic maximum principle).
In the method 400, a computing device and/or operator determines, for a specific channel described by a channel model, a process which generates a structure of the channel input distribution (block 402). The computing device and/or operator may utilize techniques from stochastic optimal control to optimize a “pay-off” (e.g., a finite block length capacity of the specific channel) over all possible processes (e.g., distributions). As discussed above for an example channel with memory, the determination of the process may include demonstrating that a certain joint process which generates the information structure of the channel input distribution at each time is an extended Markov process.
The computing device and/or operator may then determine a smaller set of processes (e.g., distributions) optimizing the capacity (block 404). This smaller set of distributions or processes includes the optimal distribution or process that achieves the finite block length capacity, or “pay-off.” For example, the smaller set of distributions may include only some of a complete set of possible channel input distributions, where the smaller set, or subset, includes the optimal channel input distribution.
The computing device and/or operator also determines a capacity formula based on the determined subset of process or distributions (block 406). As discussed further in sections B.2. and B.3. entitled “Capacity Without Feedback” and “Capacity With Feedback,” respectively, formulas for capacities may be expressed in terms of channel input distributions. Thus, upon determining subsets of channel input distributions at block 404, the computing device and/or operator may generate a formula for capacity, for the specific channel, based on the subsets of channel input distributions.
In the method 500, a computing device and/or operator applies a variational equality to subsets of input distributions (block 502). For example, the computing device and/or operator may apply the variational equality of directed information further described in C. D. Charalambous et al., “Directed information on abstract spaces: Properties and variational equalities,” http://arxiv.org/abs/1302.3971, submitted Feb. 16, 2013. Such an application of a variation equality may generate an upper bound over the subsets of input distributions (block 504). That is, the application of the variational equality further narrows subsets of input distributions (e.g., determined according to the example method 400) based on techniques specific to information theory (e.g., directed information). The computing device and/or operator also determines a refined capacity formula based on the further narrowed subsets of input distributions (block 506).
By way of example, example finite block length feedback capacity formulas and input distributions, for example classes of channels, determined according the two-step procedure (described with reference to
For an example channel condition distribution,
{PB
the optimal channel input distribution CA
[0,n]
A.1
{P
A
|B
(dai|bi−1): i=0, . . . ,n}⊂[0,n].
As such, for each i=0, 1, n, the information structures of the maximizing channel input distribution (according to the example two-step procedure described in section B.5. entitiled “Two-Step Procedure for Characterizing Capacity and Identifying Optimal Input Distributions”) is:
i
P
{b
i−1
}⊂{a
i−1
,b
i−1}.
The characterization of the finite block length feedback capacity is then
If a transmission cost, such as
γiA.1(ai,bi−1), γiB.1(ai,bi−1), γiB.2(ai,bi−Ki−1), i=0,1, . . . ,n
is imposed, then the example characterization of the finite block length feedback capacity is:
Channel distribution of example class A.1 (as defined in section B.1. entitled “Characterizing Channels”) may include one or both of distributions defined on finite and countable alphabet space and distributions defined on continuous alphabet spaces, which distributions may be induced by the models described below.
Let
where {hi(•,•,•):i=0, 1, . . . } are measurable functions, and {Vi:i=0, 1, . . . } is a random channel noise process with joint distribution:
Pv
A recursive expression as follows may define an example nonlinear channel model, for a channel in the example class A.1, with a continuous alphabet:
B
i
=h
i(Bi−1,Ai,Vi), i=0, . . . ,n
where transmission in the example model begins at time i=0, and the initial data
B
−1
b
−∞
−1
,A
0
a
0
,V
−1
v
−1
are specified according to the covention utilized in the model. For example, this data may be taken to be the null set of data or any available data prior to transmission time iε{−∞, . . . , −2, −1}.
If the channel noise process {Vi:i=0, 1, . . . } is an independent sequence (e.g., non-necessarily stationary), then the above recursive expression for the example nonlinear channel model may be utilized in step one of the two-step process with the channel probability distribution
Another recursive expression as follows may define an example linear channel model, for a channel in the example class A.1:
where, for each i=0, 1, . . . n, the coefficients {Ci,j, Di,i:i=0, 1, . . . n, j=0, 1, . . . , i−1} are real-valued matrices with dimensions p by p and p by q, respectively (e.g., with {p,q} being positive integers).
With such a linear model, the channel input distribution is obtained from
and the finite block length feedback capacity is characterized using step one of the two-step procedure, as further described in section B.5. entitled “Two-Step Procedure for Characterizing Capacity and Identifying Optimal Input Distributions.”
Another example linear channel model, defined by linear dynamics, is:
An example channel noise is non-Gaussian, independent:
P
V
(dvn)=Πi=0nPV
with a zero mean and covariance matrix:
μV
and an average transmission cost
B.6.1.2.1. Example MIMO, ANonGN Channel with Memory
For an example channel model as described above with memory, a channel input distribution is given by:
{Bi≦bi|Bi−1=bi−1,Ai=ai}={Vi≦bi+Σj=0i−1Ci,jbj−Di,jai}, i=0,1, . . . ,n.
and a characterization, according to step one of the two-step procedure, of the finite block length feedback capacity, with transmission cost, is given by:
The information structure of the example channel input distribution
{πi(dai|bi−1)≡PA
implies that a measurable function:
exists, where {Ui: i=0, 1, . . . n} is a p-dimensional random process with distribution {PUi(dui): i=0, 1, . . . n} such that
{Ui:ei(bi−1,Ui)εdai}=PA
Further, according to the definition of the linear channel model, Bi,
define a class of example admissible functions:
0,n
AN
GN−A.1−IL(){ei(bi−1,ui), i=0, . . . ,n: for a fixed bi−1 the function ei(bi−1,•)
is one-to-one and onto for i=0, . . . , n,
Another alternative characterization of the finite block length capacity with transmission cost is:
The characterization, or capacity formula, may be solved to find the capacity. For example, a computing device may solve the maximization with dynamic programming or another suitable method as further discussed with reference to
B.6.1.2.2. Example AGN Channel with Memory
In another example case, the channel noise process is Gaussian:
V
i
˜N(0,KV
or approximately Gaussian. By the entropy maximizing property of Gaussian distributions, the finite block length feedback capacity is bounded from above by the inequality
H(Bn)≦H(Bg,n),
where
B
g,n
{B
i
g
:i=0,1, . . . ,n}
is Gaussian distributed. This upper bound may be achieved when
{πi(dai|bi−1)≡PA
is conditional Gaussian and the average transmission cost is satisfied, implying that
{PB
is also conditionally Gaussian.
Similar to the procedure described in section B.6.1.2.1, a measurable function
exists such that
{Ui:ei(bi−1,Ui)εdai}=PA
Because the channel output is defined by the example linear channel model, Bi,
Moreover, the corresponding channel input process denoted by
A
g,n
{A
i
g
:i=0,1, . . . ,n}
is Gaussian distributed, satisfying the average transmission cost constraint. Also, {Ui: i=0, 1, . . . n} is Gaussian independent of Bg,i−1 for any i=0, 1, . . . , n, and
In terms of
Γi[Γi,0Γi,1 . . . Γi,i−1]·KB
the average transmission cost may be:
Thus, the finite block length feedback capacity formula, in this example case, is characterized by:
The covariance matrices {KBg,i−1: i=0, 1, . . . n} may be found from Big.
If a process {Xi: i=0, 1, . . . , n} of a source, such as the source 102, intended for transmission over this channel is Rp-valued, Gaussian distributed, and Markov, i.e.,
P
X
|X
(dxi|xi−1)=PX
and the matrices
{Γi,j*,KU
are the matrices maximizing the above expression, then the coding scheme which achieves the finite block length feedback capacity, in this case, is:
For an example channel condition distribution,
{PB
the optimal channel input distribution for the finite block length feedback capacity, CA
[0,n]
A.2
{P
A
|A
,B
(dai|ai−Li−1,bi−1):i=0, . . . ,n}⊂[0,n]
where
P
A
|A
,B
(dai|ai−Li−1,bi−1) for i=0,1, . . . ,L
may be determined from the convention used in the channel model. For example:
P
A
|A
,B
(dai|ai−Li−1,bi−1)=PA
The characterization of the finite block length feedback capacity, in this case, is:
If a transmission cost is imposed corresponding to any
γiA.2(ai−Ni,bi−1), γiC.2(ai−Ni,bi−Ki−1), i=0,1, . . . ,n.
then the characterization of the finite block length feedback capacity with transmission cost, in this example case, is:
Similar to the models discussed in section B.6.1.1, channel distribution of example class A.2 (as defined in section B.1. entitled “Characterizing Channels”) may include one or both of distributions defined on finite and countable alphabet space and distributions defined on continuous alphabet spaces, which distributions may be induced by the models described below.
Let
where {hi(•,•,•): i=0, 1, . . . } are measurable functions, and {Vi: i=0, 1, . . . } is a random channel noise process with joint distribution:
P
V
(dvn) on (n).
Recursive expressions may define a nonlinear channel model for the example channel class A.2 as follows:
B
i
=h
i(Bi−1,Ai−Li,Vi), i=1, . . . ,n
B
0
=h
0(Bi−1,A0−L,V0),
where transmission in the example model begins at time i=0, and the initial data of a pre-determined convention. If the channel noise process {Vi: i=0, 1, . . . } is an independent sequence (e.g., non-necessarily stationary), then the above nonlinear channel model may be applied in step one of the two-step process with the induced channel probability distribution:
B.6.2.2. Example MIMO AGN Channel with Memory
For a linear version of the example channel class A.2, the channel output may be modeled as:
The channel noise process, in this example case, is taken to be Gaussian distributed, i.e.,
P
V
(dvn)=Πi=0nPV
and the average transmission cost is taken to be:
This is a generalization of the example channel model discussed in section B.6.1.2.2 entitled “Example AGN Channel with Memory,” and, thus, analogous results (e.g., finite block length capacities) are generated by utilizing similar procedures.
Specifically, the channel input distribution, in this example case, may be given by:
{Bi≦bi|Bi−1=bi−1,Ai=ai}={Vi≦bi+Σi=0i−1Ci,jbj−Di,jai−Di,1−1ai−1}, i=0,1, . . . ,n.
The characterization of the finite block length feedback capacity, in this example, (i.e., a formula for the finite block length feedback capacity) is given by:
The optimal (e.g., capacity achieving) channel input distribution
{πi(dai|ai−1,bi−1)≡PA
is conditional Gaussian, in this example, and the average transmission cost is satisfied. Thus,
{PB
is also conditionally Gaussian.
The information structure of the channel input distribution
{πi(dai|ai−1,bi−1)≡PA
implies the following parametrization of the channel and channel input distribution:
Thus, the finite block length feedback capacity may be characterized by using any state space representation of the channel output process. Note, if stationarity is assumed, the above equations are further simplified.
If a process {Xi: i=0, 1, . . . , n} of a source, such as the source 102, intended for transmission over this channel is Rp-valued, Gaussian distributed, and Markov, i.e.,
P
X
|X
(dxi|xi−1)=PX
and the matrices which maximize the parametrization of the channel input distribution are denoted by:
{Γi,j*,λi,i−1*,KU
then the coding scheme which achieves the finite block length feedback capacity, in this case, is:
Although the above example illustrates a characterization of the finite block length capacity, channel input distributions, and capacity achieving coding schemes for a certain example channel of class A.2, the above discussed procedure may be extended to any continuous alphabet channel of class A.2, which channel is not necessarily driven by Gaussian noise processes.
By way of example, example finite block length feedback capacity formulas and input distributions, for other example classes of channels, determined according the two-step procedure (described with reference to
For an example channel condition distribution,
{PB
referred to as “Unit Memory Channel Output,” the optimal channel input distribution CA
[0,n]
B.1
{P
A
|B
(dai|bi−1):i=0,1, . . . ,n}⊂[0,n]A.1
This subset implies that the corresponding joint process {(Ai,Bi): i=0, . . . , n} and channel output process {Bi: i=0, . . . ,n} are first-order Markov, i.e.,
P
A
,B
|A
,B
(dai,dbi|bi−1,ai−1)=PA
P
B
|B
(dbi|bi−1)=PB
These findings are applicable to any channel input and output alphabets as those described earlier, including countable and continuous alphabet spaces.
The characterization of the finite block length feedback capacity is:
This characterization, or capacity formula, is generated by: (i) applying step one of the two-step procedure to determine a candidate set of optimal channel input distributions [0,n]A.1 (e.g., because the channel is a special case of channel distributions of class A.1); and (ii) applying step two of the two-step procedure to determine that the optimal channel input distribution is included in the narrower (e.g., including fewer elements) set [0,n]B.1.
If a transmission cost is imposed corresponding to
γiB.1(ai,bi−1)
then the example characterization of the finite block length feedback capacity with transmission cost is:
Similar to the models discussed in sections B.6.1.1 and B.6.2.1, channel distributions of example class B.1 may include one or both of distributions defined on finite and countable alphabet space and distributions defined on continuous alphabet spaces, which distributions may be induced by the models described below.
A nonlinear model of a channel in the class B.1 with continuous alphabet spaces may include a recursive expression:
The characterization of the finite block length feedback capacity of the channel defined by this model is given by:
A computing device and/or operator of a communication system may perform the optimization or maximization of CA
C
t:t−1
denote the “cost-to-go” (corresponding to CA
The characterization of the finite block length feedback capacity (or the formula for the finite block length feedback capacity) is then expressible as:
C
A
→B
FB,NCM−B.1()=C0(b−1)PB
Note, although not discussed in detail here, the above dynamic programmic recursions also apply to channels defined on finite alphabet spaces.
In some implementations, once the optimal channel input distribution is found and the finite block length feedback capacity is found, a computing device may utilize the Blahut-Arimoto algorithm to compute the maximization of the dynamic programming, working backward in time (i.e., sequentially). This utilization may reducer the computational complexity in solving the finite block length capacity, the capacity and the corresponding capacity achieving channel input distribution.
To develop another characterization of the finite block length feedback capacity consider the information structure of the channel input distribution:
{PA
This information structure implies that there exists a measurable function:
where {Ui:i=0, 1, . . . , n} is an r-dimensional random process with distribution:
{PU
such that
{Ui:ei(bi−1,Ui)εdai}=PA
Because the channel output is defined by the model, B
Further, a class of admissible functions is:
[0,n]
NCM−B.1−IL(){ei(bi−1,ui), i=0, . . . ,n: for a fixed bi−1 the function ei(bi−1,•)
is one-to-one and onto Ai for i=0, . . . , n,
The alternative example characterization of the finite block length feedback capacity is:
A computing device or operator of a communication system may solve this example maximization of CA
A linear model of a channel in the class B.1 may be expressed via a recursive expression:
where {Vi:i=0, 1, . . . } is independently distributed according to
PV
with zero mean and covariance matrix:
μV
For each i=0, 1, . . . , n, the coefficients {Ci,j,Di,i: i=0, . . . , n, j=0, 1, . . . , i−1} are real-valued matrices with dimensions p by p and p by q, respectively (e.g., {p, q} being positive integers). The channel distribution is given by:
and an example characterization of the finite block length feedback capacity is given by:
The “cost-to-go” satisfies the following example dynamic programming recursions:
Further, a computing device and/or operator may generate an alternative characterization of the finite block length feedback capacity based on the information structure of the channel input distribution. For example, the channel input distribution:
{PA
implies that there exists a measurable function:
where {Ui:i=0, 1, . . . , n} is a p-dimensional random process with distribution:
{PU
such that
{Ui:ei(bi−1,Ui)εdai}=PA
Based on the existence of this example measurable function:
and an example set of admissible functions is:
0,n
LCM−B.1−IL(){ei(bi−1,ui), i=0, . . . ,n: for a fixed bi−1 the function ei(bi−1,•)
is one-to-one and onto Ai for i=0, . . . , n,
The example alternative characterization of the finite block length feedback capacity is
B.7.1.2. Example MIMO AGN Channel with Memory
The following illustrates characterizations of capacities and channel input distributions for a special case of a channel in class B.1 described by a linear channel model. In the special case, a channel noise process is
V
i
˜N(0,KV
or approximately Gaussian.
By the entropy maximizing property of Gaussian distributions, the finite block length feedback capacity is bounded from above by the inequality
H(Bn)≦H(Bg,n),
where
B
g,n
{B
i
g
:i=0,1, . . . ,n}
is Gaussian distributed. This upper bound may be achieved when
{PA
is conditional Gaussian and the average transmission cost is satisfied, implying that
{PB
is also conditionally Gaussian.
Similar to the other procedures described above with reference to linear channel models, a measurable function
exists such that
{Ui:ei(bi−1,Ui)εdai}=PA
Based on the existence of this measurable function the channel is given by,
Because the channel output process is Gaussian distributed and a linear combination of any sequence of random variables is Gaussian distributed if and only if the sequence of random variables is also jointly Gaussian distributed, then the functions
{ei(•,•):i=0,1, . . . ,n}
are necessarily linear and {Ui:i=0, 1, . . . , n} is necessarily a Gaussian sequence, in this example case. These properties imply that the corresponding channel input process, denoted by
A
g,n
{A
i
g
:i=0,1, . . . ,n}
is Gaussian distributed, satisfying the average tranmission cost constraint. Moreover, Ui is independent of Bg,i−1 for any i=0, 1, . . . , n. Thus,
A
i
g
=e
i(Bi−1g,Ui)=Γi,i−1Bi−1g+Ui, i=0,1, . . . ,n.
B
i
g=(Di,iΓi,i−1−Ci,i−1)Bi−1g+Di,iUi+Vi, i=0, . . . ,n.
Also, because the output process is conditionally Gaussian, in this example case, the conditional entropies
H(Big|Bi−1g=bi−1)
are independent of bi−1 and
Then, defining
i,i−1
=D
i,iΓi,i−1−Ci,i−1, i=0,1, . . . ,n, Γ0,−1=0.
A computing device or operator of a communication system may express:
A
i
g=Γi,i−1Bi−1g+Ui, i=0,1, . . . ,n,
B
i
g=
K
B
E{B
i−1
g(Bi−1g)T}, i=0,1, . . . ,n.
with the following recursion:
K
B
=
In this example case, the average transmission cost is:
and the example finite block length feedback capacity is characterized by
If a process {Xi:i=0, 1, . . . , n} of a source, such as the source 102, intended for transmission over this channel is Rp-valued, Gaussian distributed, and Markov and the matrices which maximize the finite block length feedback capacity are
{Γi,i−1*,KU
then the coding scheme which achieves the finite block length feedback capacity is:
For an example channel condition distribution,
{PB
where M is a finite nonnegative integer, the optimal channel input distribution for CA
This fact implies that the corresponding joint process {(Ai,B1): i=0, . . . , n} and channel output process {Bi:i=0, . . . , n} are M-order Markov processes.
An example characterization of the finite block length feedback capacity is
Also, if a transmission cost is imposed, then the example characterization may be expressed as
For an example channel condition distribution,
{PB
where M is a finite nonnegative integer, the optimal channel input distribution for CA
An example characterization of the finite block length feedback capacity is
If a transmission cost is imposed corresponding to any instantaneous transmission cost function of classes A, B, and C, then the example characterization of the finite block length feedback capacity is given by the above expression for [0,n]B.3 using
[0,n]
B.3∩[0,n]().
By way of example, finite block length feedback capacity formulas and input distributions, for still further classes of channels, determined according the two-step procedure (described with reference to
For an example channel with a channel conditional distribution:
{PB
the optimal channel input distribution for CA
This inclusion implies that the corresponding joint process {(Ai,Bi):i=0, . . . , n} and channel output process {Bi:i=0, . . . , n} are second-order Markov processes, i.e.,
An example characterization of the finite block length feedback capacity is
If a tranmission cost is imposed, this example characterization may be expressed as:
For an example channel condition distribution,
{PB
the optimal channel input distribution for CA
This inclusion implies that the corresponding joint process {(Ai,Bi):i=0, . . . , n} and channel output process {Bi:i=0, . . . , n} are limited-memory Markov processes.
For a special case of the example channel class C.2, an example channel conditional distribution is
{PB
and the optimal channel input distribution for CA
This inclusion implies that the corresponding joint process {(Ai,Bi):i=0, . . . , n} and channel output process {Bi:i=0, . . . , n} are first-order Markov processes.
An example characterization of the finite block length feedback capacity, in this example case is
For another special case of the example channel class C.2, an example channel conditional distribution is
{PB
and the optimal channel input distribution for CA
[0,n]
UMCI
{P
A
|A
,B
(dai|ai−1,bi−1):i=0,1, . . . ,n}.
This inclusion implies that the corresponding joint process {(Ai,Bi):i=0, . . . , n} and channel output process {Bi:i=0, . . . , n} are first-order Markov processes.
An example characterization of the finite block length feedback capacity, in this example case is
B.9. Example Characterizations of Capacities and Identification of Channel Input Distributions for MIMO ANonGN Channels with Memory
The two step procedure (described with reference to
where {Vi:i=0, 1, . . . , n} is p-dimensional nonstationary non-Gaussian distributed noise, {Ai: i=0, 1, . . . , n} are q-dimensional channel input processes, and a condition “An is causally related to Vn” is represented, in the example model, by
The channel conditional distribution of the example nonstationary MIMO ANonGN channel is
An example characterization of the finite block length feedback capacity for the MIMO Additive Non-Gaussian Noise channels with memory may be expressed as:
where the transition probability distribution of the channel output process {Bi:i=0, 1, . . . , n} is given by the above mentioned model.
If the noise process is non-Gaussian with conditional distribution:
{PV
and the instantaneous tranmission cost is:
γi(ai−Li−1,bi−1)γi1(ai−Li,bi−Li−1), i=0, . . . ,n.
then another example characterization of the finite block length feedback capacity for a channel in this class of channels is
In another example, the noise process is Gaussian with conditional distribution:
{PV
and the instantaneous transmission cost function is:
γi(ai−Li−1,bi−1)γi1(ai−Li,bi−Li−1), i=0, . . . ,n.
In this case, the optimal channel input distribution is:
P
A
|A
,V
*(ai|ai−Li−1,vi−1), i=0, . . . ,n
and it is conditionally Gaussian. A Gaussian process
{Aig:i=0,1, . . . ,n}
realizes this distribution, where
That is, at each time i=0, 1, . . . n the Gaussian process is a linear combination of {Ai−Lg,i−1,Vi} and Gaussian random variables.
In yet another example, the noise process is Gaussian, and satisfies
{PV
In this case, the optimal channel input distribution is:
P
A
|A
,V
*(ai|ai−Li−1,vi−Li−1), i=0, . . . ,n
and it is conditionally Gaussian. A Gaussian process
{Aig:i=0,1, . . . ,n}
realizes this distribution, where
In still another example, the noise process is scalar Gaussian, An is causally related to and defined by:
{PV
and the instantaneous tranmission cost function is)
γi(ai−Li−1,bi−1)γ(ai), i=0,1, . . . ,n.
In this case, the Gaussian process
{Aig:i=0,1, . . . ,n}
defined by:
is a realization of the optimal channel input distribution. Further, if
{PV
is stationary, the Gaussian process realization further reduces to:
The example two-step procedure (described with reference to
If an example channel has memory and an instantaneous transmission cost constraint:
[0,9]()
then
C
A
;B
noFB()≦CA
and feedback encoding does not provide additional gain compared to encoding without feedback if and only if the following identify holds:
C
A
→B
FB()=CA
Further, feedback encoding does not increase capacity without feedback if and only if:
where the limits are finite.
Next, further example notation is introduced. Specifically, let
where I(An;Bn) is a functional of the channel distribution and the channel input distribution without feedback denoted by
{PA
The maximum information structure without feedback, in this example notation, is
{ai−1}, i=0,1, . . . ,n.
Also, let
That is, I(An→Bn) is a functional of the channel distribution and the channel input distribution with feedback denoted by
{PA
The maximum information structure without feedback, in this example notation, is
{ai−1,bi−1}, i=0,1, . . . ,n.
Using this example notation, for a channel with memory and an encoder with a transmission cost constraint, the finite block length capacity without feedback with transmission cost is:
and, similarly, the finite block length capacity with feedback with transmission cost is:
Also, define a set satisfying conditional independence as
[0,n]
CI
{P
A
|A
,B
FB(dai|ai−1,bi−1=PA
The characterization of finite block length capacity with feedback is equal to the characterization of finite block length capacity without feedback if and only if the corresponding optimal channel input distribution of the former belongs to the above set. Thus, CA
For a memoryless channel, this condition holds because the optimal channel input distribution which corresponds to the characterization of finite block length feedback capacity satisfies
P
A
|A
,B
FB(dai|ai−1,bi−1)=PA
Also, the optimal channel input distribution which corresponds to the characterization of finite block length capacity without feedback satisfies
P
A
|A
noFB(dai|ai−1)=PA
For any example channel in classes A, B, and C and any instantaneous transmission cost function in classes A, B, and C, let
{P(dbi|iQ):⊂{ai−1,bi−1}, i=0, . . . ,n}
denote the channel distribution and let
{PFB,*(dai|iFB):i=0,1, . . . ,n}ε[0,n]FB()⊂[0,n](), iFB⊂{ai−1bi−1}, i=0,1, . . . ,n}
denote the channel input distribution corresponding to the characterization of the finite block length feedback capacity CA
{P*,noFB(dai|inoFB):i=0,1, . . . ,n}, inoFB⊂{ai−1}, i=0,1, . . . ,n,
which induces the joint distribution
P
A
,B
FB,*(dan,dbn)
and the channel output distribution
P
B
FB,*(dbn)
corresponding to the pair
{PFB,*(dai|iFB),P(dbi|iQ):i=0, . . . ,n}.
In addition to characterizing the “finite block length” feedback capacity for channels, such as the channel 106, the techniques of the present disclosure include methods to design encoders that achieve characterized capacities of channels with memory. These capacity achieving encoders are “information lossless encoders,” in that the mapping, implemented by the encoders, of information from a source to encoded symbols are sequentially invertible. Encoders violating this property are not able to achieve the capacity of a given channel.
Further, for each of the example channels (e.g., A, B, and C) discussed herein, a computing device and/or operator of a communication system may generate specific coding schemes or mappings for the “information lossless” encoder based on the characterization of the capacity for the channel and a corresponding optimal channel input distribution. That is, all capacity achieving encoders may be “information lossless” encoders, but, for a given channel, a computing device or operator may generate a specific “information lossless” coding scheme based on a characterization of capacity for that channel. A computing device and/or operator may also define additional conditions of the information lossless encoder for a specific channel based on a characterization of capacity for the specific channel.
The optimal (i.e., “information lossless”) encoding schemes designed according to the method discussed below may reduce the complexity of communication systems and operate optimally. For example, optimal operation may include an optimal operation in terms of the overall number of processing elements (e.g., CPUs) required to process transmissions and/or the number of memory elements and steps required to encoder and decode messages. Such optimal encoding and decoding schemes may require small processing delays and short code lengths in comparison to encoding and decoding schemes designed based on an assumption of a channel without memory or based on a separate treatment of source codes and channel codes.
Although, encoders and corresponding necessary and sufficient conditions discussed below are described, by way of example, with reference to the example communication system 100, which system 100 is a point-to-point communication system, encoders of the present disclosure may be applied to or implemented in systems other than point-to-point communication systems. For example, encoders of the present disclosure may be implemented in multi-user and network communication systems by repeating the procedures described below (for each user, for each node of a network, etc) Implementations of encoders may even be utilized in joint collaborative communication.
In the process 600, an information lossless condition 604, a characterization of capacity 606 (e.g., finite block length feedback capacity), and an optimal channel input distribution 608 are input into an encoder design procedure 610. The characterization of capacity 606 and the optimal channel input distribution 608 may be generated according to an implementation of the two-step procedure (described with reference to
The encoder design procedure 610 may, based on the information lossless condition 604, the characterization of capacity 606, and the optimal channel input distribution 608, generate the information lossless encoder 602. The information lossless encoder 602 may be utilized in an implementation of the system 100 as encoder 104, for example. Also, although not emphasized in the below description, decoders corresponding to the information lossless encoder 602 may also be generated by the encoder design procedure 610 based on the information lossless condition. In this manner, computing devices and/or operators of communication systems may design encoding and decoding processing for a communication system, such as the system 100.
Generally, an encoding process may encode received symbols xn from a source, such as the source 102, into channel input symbols an{a0, a1, . . . , an}, ajεXj, where j=0, 1, . . . , n, as further discussed with reference to
Although
In the method 800, an information lossless condition is determined for a class of channels (block 802). For example, a computing device or operator may determine an information lossless condition for one or more of the classes of channel described in section B.1. entitled “Characterizing Channels.” The determination of the information lossless condition may include utilizing a general definition of information lossless encoders along with properties of a channel defined in a channel model to generate one or more specific information lossless conditions for a specific class of channels. Example determinations utilizing this procedure are discussed further below for example channels in classes A, B, and C (as defined in described in section B.1).
The method 800 may also include receiving a characterization of channel capacity and a corresponding channel input distribution (block 804). The characterization of channel capacity and corresponding input distribution may correspond to the class of channels associated with the information lossless conditions determined at block 802. The characterization or formula for the capacity may be a characterization or formula for a finite block length feedback capacity, feedback capacity, finite block length feedback capacity with transmission cost, feedback capacity with transmission cost, finite block length capacity without feedback, capacity without feedback, finite block length capacity without feedback and with transmission cost, and capacity without feedback and with transmission cost. Further, the characterization of channel capacity and corresponding channel input distribution may be an output from the two-step procedure (described with reference to
A computer device and/or operator of a communication system may then utilize the information lossless condition(s), characterization of channel capacity, and channel input distribution to determine an encoding scheme (block 806). That is, the computer device and/or operator may design the encoding scheme based on both properties of the channel (e.g., capacity and optimal channel input distribution) and necessary and sufficient conditions for any encoder to achieve the capacity of a channel with memory. Certain of these optimal or capacity achieving encoding schemes for specific classes of channels are discussed in sections B.6.1.2.2., B.6.2.2., and B.7.1.2.
For further clarification and by way of example, the section below include necessary and sufficient conditions for any encoder of example classes A and B to be information lossless. Based on these conditions and based on characterizations of capacities and optimal channel input distributions, computing devices and/or operators may design encoders for transmission of information over channels in the example classes A and B. The example corresponding to class A includes an encoder with feedback and the example corresponding to class B includes an encoder without feedback.
A feedback encoder corresponding to the example class A (e.g., class A channels) may be referred to herein as
e
nε[0,n]
The information structure entering the example encoder at any time i may be expressed as {ai−1, xi,bi−1}.
By substituting ai−1 recursively into the right side of
a
i
=e
i(ai−1,xi,bi−1)
then
a
i
=e
i(ai−1,xi,bi−1)≡ēi(xi,bi−1), i=0, . . . ,n.
Thus, for any feedback encoder of example class A, the information structure of the encoder at each time instant i is:
i
e
{a
i−1
,x
i
,b
i−1
}≡{x
i
,b
i−1
}, i=0, . . . ,n,
and this information structure is the most general classical information structure among all possible deterministic nonanticipative encoders with feedback.
Given any feedback encoder of example Class A, an encoder and any source and channel disributions, the information from the source to the channel output is the directed information defined by
Also, given any encoder of Class A, the following chain rule of conditional mutual information holds:
where the inequality is due to the nonnegativity of conditional mutual information. In fact, the following stronger version of this expression holds:
For any enε[0,n] then
A feedback encoder of example class A is “information lossless” with respect to the directed information measures
I(Xn→Bn) and I(An→Bn)
if
I(Xn→Bn)=I(An→Bn), ∀enε[0,n]IL⊂[0,n].
A sufficient condition for a feedback encoder of example Class A to be “information lossless” according to this definition is based on the following conditional independence:
X
i
(Ai,Bi−1)Bi, i=1, . . . ,n. MC1:
Given any feedback encoder of example class A, if
X
1
(Ai,Bi−1)Bi then I(Xi,Bi|Bi−1,Ai)=0,
and the following identify holds:
I(Xi;Bi|Bi−1)=I(Ai;Bi|Bi−1), i=0,1, . . . ,n.
Hence, any class of functions which induces MC1 or equivalently, induces the conditional independence on the sequence of channel conditional distributions
P
B
|B
,A
,X
=P
B
|B
,A
, i=0,1, . . . ,n
is an information lossless class of functions.
Sufficient conditions (and also necessary) for any class of functions to be an information lossless class, for a feedback encoder of example class A channel, may be expressed as follows:
Class A Encoder is Information Lossless if
for fixed b−1ε−1, e0(Φ,b−1):00 is one-to-one, onto 0,
and its inverse φ0e0−1(•,b−1):00 is measurable,
for fixed (a0,x0,b−1,b0)ε0×0×−1×0, e1(a0,x0,•,b−1,b0):11
is one-to-one onto 1, and its inverse φ1e1−1(a0,•,x0,b−1,b0):11 is measurable
for any i=2, 3, . . . , n,
for fixed (ai,xi−1,bi−1)εi×i−1×i−1×i, ei(ai,xi−1,•,bi−1):ii is one-to-one, onto i, and its inverse φiei−1(ai−1,•,xi−1,bi−1):ii is measurable.
All of the examples of capacity achieving encoders with feedback, discussed in the above sections, satisfy these necessary and sufficient conditions and are, thus, information lossless
This class of Information Lossless encoders also satisfies the following conditional independence:
A
i
(Xi,Bi−1)Bi, i=0,1, . . . ,n. MC2:
Still further, the following stronger identity holds for the information lossless encoders:
C.2.2. Encoders without Feedback Corresponding to Example Class B
An encoder without feedback corresponding to the example class B may be referred to herein as
e
nε[0,n]nfb,
The information structure entering the example encoder without feedback at any time i may be expressed as {ai−1,xi}.
By substituting ai−1 recursively into the right side of
a
i
=e
i(ai−1,xi)
then
a
i
=e
i(ai−1,xi)≡ēi(xi), i=0, . . . ,n.
Thus, for any encoder without feedback of example class B, the information structure of the encoder at each time instant i is:
i
e,nfb
{a
i−1
,x
i
}≡{x
i
}, i=0, . . . ,n,
and this information structure is the most general classical information structure among all possible deterministic nonanticipative encoders.
Given any encoder without feedback of example Class B and any source and channel distributions, the information from the source to the channel output is the mutual information defined by
An encoder without feedback of example class B is “information lossless” with respect to the directed information measures
I(Xn;Bn) and I(An;Bn)
if
I(Xn;Bn)=I(An;Bn), ∀enε[0,n]IL.nfb⊂[0,n]nfb.
Sufficient conditions (and also necessary) for any class of functions to be an information lossless class, for an encoder without feedback of example class B, may be expressed as follows:
Class B Encoder is information Lossless if
e0(•):0 is one-to-one, onto 0,
and its inverse φ0e0−1(•)·00 is measurable
for fixed (a0,x0)ε0×0, e1(a0,x0,•):11
is one-to-one, onto 1, and its inverse φ1e1−1(a0,•,x0):11 is measurable for any i=2, 3, . . . , n,
for fixed (ai,xi−1)εi×i−1, ei(ai,xi−1,•):ii is one-to-one, onto i, and its inverse φiei−1(ai−1,•,xi−1):ii is measurable.
Still further, the following stronger identity holds for the information lossless encoders:
D. Compressing Information with Zero-Delay
The below-discussed compressions of information with zero-delay utilize a “nonanticipative” (e.g., zero-delay) rate distortion function (RDF). This function along with various other relevant quantities are defined and discussed below before specifying a number of example compression schemes. Also, in the following discussion, a “non-stationary” source may refer to a source of information, such as the source 102, that varies in time.
Generally, an RDF may define an manner in which data is to be sent over a channel, such as the channel 106. For example, an RDF may define a number of bits per symbol of information that should be sent over a channel. The manner in which data is sent over a particular channel may be optimal or not depending on the particular RDF that defines compression for the particular channel. For example, to achieve a capacity, such as the capacities discussed in section B, a compression of information may utilize a nonanticipative RDF, as discussed below.
A nonanticipative RDF may be defined in terms of a “source distribution,” a “reproduction distribution,” and a “fidelity of reproduction,” in an implementation. The source distribution may be a collection of conditional probability distributions:
{PX
The reproduction distribution may also be a collection of conditional probability distributions:
{PY
Also, to express the nonanticipative RDF, the following family of causal conditional distributions are defined:
{right arrow over (P)}
Y
|X
(dyn|xn)t=0nPY
Given the source distribution and reproduction distribution, the “joint distribution” is given by:
P
X
,Y
(dxn,dyn)PX
and the “marginal distributions” are given by:
P
Y
(dyn)∫X
The distortion function of reproducing xt by yt, for t=0, 1, . . . , n, may be a measurable function:
d
0,n:0,n×0,n[0,∞]d0,n(xn,yn)=Σt=0nρt(Ttxn,Ttyn),
where,
T
t
x
n
⊂{x
0
,x
1
, . . . ,x
t
},T
t
y
n
⊂{y
0
,y
1
, . . . ,y
t
}, t=0,1, . . . ,n.
The fidelity set of reproduction conditional distributions is then defined by:
where D≧0.
The information measure of the nonanticipative RDF may be a special case of directed information defined by:
The finite time nonanticipative RDF may be defined by:
and the nonanticipative RDF rate may be defined by:
This RDF function may specify that Rna(D) bits/symbol are to be transmitted over a channel, such as the channel 106, such that the distortion does not exceed D. The distortion D may be represented by any suitable distortion measure, such as the Hamming distortion measure, the squared-error distortion measure, etc.
D.2. Methods for Compressing Information with Zero-Delay
In the method 900, a computing device and/or operator of a communications system determines a nonanticipative rate distortion function (RDF) (block 902). The nonanticipative RDF may be a function, as described above in section D.1, that is not dependent on all transmitted symbols over a channel. Rather, the nonanticipative RDF may be causal in that the nonanticipative RDF only depends on previously transmitted information over a channel. In this manner the determined nonanticipative RDF is zero-delay.
In some implementations, a computer and/or operator of a communications system may determine the nonanticipative RDF according to the definitions in section D.1 and further characterize the nonanticipative RDF according to properties of the channel over which information is to be transmitted. For example, a computer or operator may characterize the nonanticipative RDF according to a properties of an AWGN channel, as further discussed in various examples presented below. In any event, the method 900 includes determining a nonanticipative RDF, where the nonanticipative RDF may be expressed in any suitable form including a general form for any suitable channel or a more specific characterization according to properties of a specific channel.
The method 900 also includes determining a rate of information transfer based on the RDF and an allowed amount of distortion (block 904). In some implementations, the allowed amount of distortion may be a number or expression representing an amount of distortion (e.g., Hamming or squared-error distortion) of information transmitted over one or more channels. The computer or operator of a communications system may determine the allowed amount of distortion based on a desired performance (e.g., efficiency) of transmitting information and/or based on desired qualities of the received information after transmission. For example, for information representing speech, an operator of a communication system may specify (e.g., by configuring a computer to compress information) an allowed amount of distortion such that transmitted speech signals are understandable to a human after being decoded.
The computer and/or operator may provide the allowed amount of distortion to the determined nonanticipative RDF (e.g., as input) to determine the rate of information transfer, which rate corresponds to the allowed amount of distortion. In other words, when the allowed amount of distortion is provided to the nonanticipative RDF, the nonanticipative RDF produces a corresponding rate of information transfer. If information is transferred over a channel at this rate, the information will be distorted (e.g., according to a squared-error distortion) at a level at or below the allowed amount of distortion. The computer and/or operator implementing portions of the method 900 may express the rate as a number of bits per source symbol, a number of bytes per source symbol, or any other suitable amount of data per symbol of source information.
In some implementations, the computer and/or operator may utilize a buffer or range in determining the rate of information transfer. For example, instead of utilizing one allowed amount of distortion, the computer and/or operator may determine a range of information transfer rates (e.g., bits/symbol) based on an allowed range of distortions. In other examples, the computer and/or operator may determine a rate of information transfer based on a proxy amount of distortion, which proxy defines a buffer between an actual allowed amount of distortion and the proxy amount.
Still further, the method 900 includes compressing information from a source according to the determined rate (block 906). The computing device implementing at least this portion of block 900 may apply any number of suitable compression methods to compress information from a source of information, such as the source 102, such that the compressed information results in a rate (e.g., bits/symbol) at or less than the rate determined at block 904. Example compression methods may include, by way of example, A-law algorithms, code-excited linear predictions, linear predictive coding, mu-law algorithms, block truncation coding, fast cosine transform algorithms, set partitioning in hierarchical trees, etc. Alternatively, a computing device may utilize a coding scheme designed according to the JSCC methods discussed herein to simultaneously compress and encode information from a source.
The method 900 still further includes transmitting the compressed information at the determined rate (block 908). Once compressed, components of a communication system, such as the system 100, may transmit the compressed information at the determined rate. In some implementations, this may include further encoding of the compressed information, and, in other implementations utilizing JSCC, the compressed information may already be encoded for optimal transmission over a channel.
As discussed in section D.1, a finite-time (e.g., not taking an infinite number of transmissions) expression for the nonanticipative RDF may include an infimum. Below, closed-form expressions are presented for a nonstationary optimal reproduction conditional distribution, which distribution attains the infimum of the finite-time nonanticipative RDF, R0,nna(D).
If the infimum of RZ (D) is attained at:
{PY
then R0,nna (D) satisfies the following backward in time recursive equations:
For t=n:
where s<0 is the Lagrange multiplier of the fidelity, and
For t=n−1, n−2, . . . , 0:
where gt,n(xt,yt) is given by:
gt,n (xt, yt)=−PX
and the finite time nonanticipative RDF is given by:
If R0,nna(D)>0, then s<0, and
From the above expressions, given any source distribution, a computing device and/or operator of a communication system may identify a dependence of the optimal nonstationary reproduction distribution on past and present source symbols. However, the above expressions do not immediately yield a dependence on past reproduction symbols, referred to herein as “information structures.” Regarding this dependence, the following observations are presented:
(1) The dependence of PY
(2) If ρn(Tnxn,Tnyn)={tilde over (ρ)}(xn,yn) then PY
(3) If PX
(4) If gt,n(xt,yt)=gt,n(xt,yt−1), t=0, . . . , n−1, optimal reproduction distribution (IV.227) reduces to
To further clarify the dependence of the optimal reproduction distribution on past reproductions, an alternative characterization of R0,nna(D) may include a maximization over a certain class of functions. A computing device and/or operator of a communications system may utilize this alternative characterization to derive lower bounds on R0,nna(D), which bounds are achievable.
The alternative characterization may be expressed as:
For sε(−∞,0|, a necessary and sufficient condition to achieve the supremum of the above alternative characterization of R0,nna(D) is the existence of a probability measure
P
Y
|Y
*(dyt|yt−1)
such that
λt(xi, yt−1)={esρt(x
and such that,
The above alternative characterization of R0,nna(D) may allow a computing device and/or operator to compute R0,nna(D) exactly (e.g., as part of the example method 600) for a given source with memory.
In an example scenario, the following expression describes a p-dimensional nonstationary Gaussian source process:
X
t+1
=A
t
X
t
+B
t
W
t
, t=0,1, . . . ,n−1
where
A
tεp×p,Btεp×k, t=0,1, . . . , n−1.
For an Autoregressive Moving Average model with finite tap delays, there may exist a state space representation for some p. For the following example analysis in this scenario, assume:
(G1) X0εp is Gaussian N(
(G2) {Wt:t=0, . . . , n} is a k-dimensional IID Gaussian N(0,k×k) sequence, independent of X0:
(G3) The distortion function is single letter defined by d0,n(xn,yn)Σt=0n∥xt−yt∥.
The nonstationary optimal reproduction distribution may be given, for s≦0, by the following recursive equations:
Thus, the optimal reproduction distributions may be conditionally Gaussian, and the optimal reproduction distributions may be realized using a general Gaussian channel with memory, modeled by:
Y
t
=Ā
t
X
t
+
t
Y
t−1
+V
t
c
, t=0, . . . ,n
where
Ā
tεp×p,
are independent sequences of Gaussian vectors:
{N(0:Qt):t=0, . . . ,n}.
Gaussian error processes may introduce the pre-processing at the encoder and/or decoder, in this example. Let,
{Kt:t=0, . . . ,n}, KtXt−{Xt|Yt−1},
denote the covariance of the pre-processing by:
Λt{KtKttr}, t=0, . . . ,n.
Also, let Et be a unitary matrix such that:
E
tΛtEttr=diag{λt,1, . . . λt,p}, ΓEtKt, t=0, . . . ,n.
Analogously, to obtain the nonanticipative RDF in this example, the processes:
{{tilde over (k)}t:t=0, . . . ,n}
defined by
{tilde over (K)}
t
Y
t
−
{X
t
|Y
t−1
}≡Y
t
−{circumflex over (X)}
t|t−1,{tilde over (Γ)}t=Et{tilde over (K)}t.
are introduced. Using properties of conditional entropy, observations for this example scenario include:
d
0,n(Xn,Yn)=d0,n(Kn,{tilde over (K)}n)=Σt=0n∥{tilde over (K)}t−Ktp=Σt=0n∥{tilde over (Γ)}t−Γtp.
and
R
0,n
na(D)=R0,nna,K
Using these observation, a computing device and/or operator of a communication system may obtain the optimal (e.g., capacity achieving) nonanticipative RDF for the above defined multidimensional Gaussian process. The computing device and/or operator may also identify a “duality” between a multidimensional Gaussian process and a MIMO AWGN channel. These results are described below by way of example.
The R0,nna(D) of the example Gaussian source, according to the definitions in section D.1 and the above discussed model of the Gaussian source, is given by:
where the error
X
t
−
{X
t
|Y
t−1}
is Gaussian, and where A: are given by the following Kalman filter equations:
σ{Yt}=σ{{tilde over (K)}t}=σ{Bt}, t=0, . . . ,n
(i.e., these processes generate the same information).
Y
t
=E
t
tr
H
t
E
t(Xt−{circumflex over (X)}t|t−1)+EttrtVtc+Xt|t−1
The encoder 1002 and decoder 1006 of the system 1000 may encode and decode, respectively, with feedback according to:
A
t
=E
t
tr
H
t
E
t(Xt−{circumflex over (X)}t|t−1)
Y
t
=E
t
tr
H
t
E
t
{X
t
−{circumflex over (X)}
t|t−1
}+E
t
tr
t
V
t
c
+X
t|t−1
Alternatively, the encoder 1002 and decoder 1006 of the system 1000 may encode and decode, respectively, without feedback according to:
A
t
=E
t
tr
H
t
E
t(Xt−(Xt))
Y
t
=E
t
tr
H
t
E
t
{X
t−(Xt)}+EttrtVtc+(Xt)
By taking a limit of R0,nna(D), a computing device and/or operator may obtain the per unit time nonanticipative RDF Rna(D) from R0,nna(D). This Rna(D) may represent the rate distortion function of stationary (e.g., not varying in time) Gaussian sources of information, in this example. Specifically, the Rna(D) is obtained as follows:
where
limn→∞δt,i=δ∞,i and limn→∞λt,i=λ∞,i.
In addition, for a scalar Gaussian stationary source:
For an independent and identically distributed (IID) scalar Gaussian source:
Note, for a vector of independent sources with memory, the nonanticipative RDF involves water filing in the spatial dimension. The realization of these sources over an Additive White Gaussian Noise Channel with an average power constraint not exceeding P is shown in
Returning to
(1) Capacity Achieving Realization with Feedback.
Consider the realization of the
(i.e., for IID processes classical RDF=nonantieipative RDF), Let X b a RV
N(0;σX2) with σX≧D. Letting p=1, then from (I.V.256)-(IV.257) we have
which implies
Substituting into the encoder the limiting values, δ∞,1=D, limn→∞qt,1=q∞,1, limn→∞Pt,1=P∞,1=then for i=0, 1, . . .
(2) Capacity Achieving Realization without Feedback.
When there is no feedback, all state-ments in (1) hold, λ∞,1=σX2, while (X|Bi−1) is replaced by (Xi|σ{})=(X) (i.e, only á priori information is used), and then (IV.264) reduces to
The following description may refer to “Joint Source Channel Coding,” or JSCC, as a coding/decoding scheme or a design of a coding/decoding scheme that does not separate source encoder and decoders and channel encoders and decoders. Such JSCC may produce a coding scheme that both compresses and encodes data for transmission, where the compression occurs with zero-delay and the compression and encoding allow the transmission to achieve a capacity of a channel.
To further clarify this point and by way of example,
The example method 900 includes determining a rate distortion function (RDF) of the source (block 1202). The source, for example, may be the source 102, and a computing device and/or operator may determine the RDF to be a nonanticipative RDF corresponding to the source 102, as further discussed in section D entitled “Encoder Design Overview and Methods.” In this manner, the computing device and/or operator may determine an RDF that is zero-delay. In some implementations, the determined RDF may include one or more realizations of an encoder, channel, and decoder representing the RDF, and, in other applications, the determined RDF may include one or more sets of data, algorithms, or instructions that represent the RDF and are stored a non-transitory computer-readable medium.
The example method 900 also includes determining a capacity of a channel over which information, generated by the source, is to be transmitted (block 1204). The determined capacity may include a characterization or formula for a finite block length feedback capacity, feedback capacity, finite block length feedback capacity with transmission cost, feedback capacity with transmission cost, finite block length capacity without feedback, capacity without feedback, finite block length capacity without feedback and with transmission cost, and capacity without feedback and with transmission cost. A computing device and/or operator may determine this characterization or formula according to the method 300, for example.
An {encoder, decoder} pair is then identified (block 1206), where the {encoder, decoder} pair realizes the determined RDF and achieves the determined capacity. The {encoder, decoder} pair may include a single code for the encoder, which compresses and encodes symbols from a source, such as the source 102. The {encoder, decoder} pair may also include a single decoder, which code both decodes and decompresses symbols transmitted over the channel. In other words, the encoder may receive uncompressed symbols from a source and output compressed and encoded symbols for transmission over a channel, and the decoder may receive the transmitted symbols and output uncompressed and decoded symbols.
Identifying the {encoder, decoder} pair that realizes the determined RDF and achieves the determined capacity may include identifying an {encoder, decoder} pair that satisfies one or more conditions, in an implementation. As discussed further below, a computing device and/or operator may utilize the determined RDF and the determined capacity to generate specific conditions that the identified {encoder, decoder} pair must satisfy. Then the computing device and/or operator may identify a specific {encoder, decoder} pair that realizes the determined RDF and achieves the determined capacity.
Returning to
E.2. Methodology of JSCC Design for General {Source, Channel} Pairs with Memory
In this section, example methodologies for JSCC design for general {source, channel} pairs with memory and with respect to {distortion function, transmission cost} pairs are developed. A computing device and/or operator may implement these methodologies as part of the example method 900, for example, or in another method in which encoders and/or decoders are designed according to JSCC. The methodologies are for JSCC design with respect to general {distortion function, transmission cost} pairs, where some examples of distortion functions and transmission cost functions are further discussed in sections B and D.
To facilitate the development of JSCC methodologies, an example definition of nonanticipative code is introduced below. The definition is for any
{source, channel}≡{PX
with respect to any
{distortion function, transmission cost}≡{d0,n,c0,n}
A set of randomized nonanticipative encoders with feedback denoted by
[0,n]
fb,E
may be a sequence of conditional distributions:
P
A
|A
,B
,X
(ai|ai−1,bi−1,xi), i=0,1, . . . ,n
Also, an example set of randomized feedback encoders embeds nonanticipative deterministic feedback encoders defined by:
The example encoders introduced above are nonanticipative in the sense that at each transmission time, i,
P
A
|A
,B
,X
(ai|ai−1,bi−1,xi) and ei(ai−1,bi−1,xi)
do not depend on any future symbols (e.g., symbols to be transmitted at future times). The encoders may only be functions of past and present symbols and past channel inputs and outputs, in an implementation.
A set of randomized and deterministic nonanticipative encoders without feedback may be defined by:
Also, a randomized decoder may be a sequence of conditional distributions defined by
embedding deterministic decoders denoted by:
Given any source, nonanticipative encoder as defined above, and randomized decoder as defined above, a joint probability distribution may be defined as:
P
X
,A
,B
,Y
(dxn,dan,dan,dbn,dyn)=i=0nPY
Given any
{PX
a nonanticipative code for JSCC design of a system,
is a nonanticipative {encoder, decoder} pair
{PA
This nonanticipative {encoder, decoder} pair has an excess distortion probability:
{d0,n(xn,yn)>(n+1)d}≦ε,εε(0,1),d≧0
and transmission cost:
The minimum excess distortion achievable by a nonanticipative code may be: Do(n,ε,)inf{d:∀(n,d,ε,) nonanticipative code}.
The following description defines an example “realization” of the conditional distribution corresponding to the R0,nna(D) for a given source.
Given a nonanticipative code:
a realization of the optimal reproduction distribution:
{PY
corresponding to R0,nna(D) may be an {encoder, decoder} pair with the encoder used with or without feedback, such that
{right arrow over (P)}
Y
|X
*(yn|xn)=i=0nPY
In such a case, the R0,nna(D) is “realizable,” because the realization operates with average distortion≦D.
In some implementations, given a system:
multiple realization of the optimal reproduction distribution may exist. For example, in systems utilizing Gaussian sources and channels with memory, realization with feedback encoding and without feedback encoding may exist.
To identify the {encoder, decoder} pair that realizes the determined RDF and achieves the determined capacity, as discussed with reference to
R
0,n
na(D)=CA
Then, for such a realization, a computing device and/or operator may compute the excess distortion probability is computed and the minimum excess distortion achievable. Because the RDFs and capacities may be determined (e.g., according to the example methods 300 and 600), the {encoder, decoder} pair follows from the above condition.
E.3. Example JSCC Design for a Binary Symmetric Markov Source Over a Binary State Symmetric Channel with Memory
Although the following description provides an example design of an {encoder, decoder} pair according to JSCC for a specific type of source and channel, the techniques of the present disclosure may allow {encoder, decoder} pairs to be designed for other types of channels and/or sources. For example, the techniques disclosed herein may be utilized to generate {encoder, decoder} pairs for binary, symmetric, non-symmetric, Gaussian, non-Gaussian, stationary, non-stationary, etc. sources and/or channels. In fact, the realization of
In the following example, an {encoder, decoder} pair, designed according to JSCC, realizes an RDF for a channel and operates at the capacity of the channel. Specifically, the example includes a binary symmetric Markov source (BSMS(p)) and a binary state symmetric channel (BSSC(αi,βi)). The example also utilizes a single letter Hamming definition of distortion and a single letter instantaneous cost function with and without feedback.
The nonanticipative RDF of the BSMS(p) may be developed from the following transition probabilities:
Also, the single letter Hamming distortion criterion between the source symbol and the reproduction (e.g., decoded) symbol may be defined by:
ρ(xi,yi)=0 if xi=yi,
and 0 otherwise.
According to the definitions of nonanticipative RDF presented in section D, Rna(D) of a BSMS(p) with single letter Hamming distortion is given by:
where D is the average distortion,
The optimal reproduction distribution may be given by:
To determine the capacity of BSSC(αi,β1) with and without feedback, a computing device and or operator may, in this example, consider a special case of the unit memory channel of the same structure as the optimal reproduction distribution:
Also, the state of a channel may be:
s
i
a
i
⊕b
i−1.
The single letter cost function may be:
Also, an average transmission cost is imposed, where the average transmission cost is defined by
wire =constant.
In this example, the average transmission cost at time i may be:
{γ(Ai,Bi−1)}=PA
According to the techniques discussed in section B, the capacity of the BSSC(α1,βi) with and without feedback and average transmission cost is equal and may be expressed as:
C
A
→B
FB()=CA
where
λ=α1+(1=β1).
The capacity achieving channel input distribution without feedback is:
where
γ=α1+β1(1−).
and the capacity achieving channel input distribution with feedback is:
Now that the capacity and RDF are determined for this example, a computing device and/or operator may identify or construct an {encoder, decoder} pair that realizes the determined RDF and achieves the determined capacity.
For
=m, α1=α and β1=β
the {encoder, decoder} pair to be identified or constructed must satisfy:
C
A
→B
FB()=H(p)−mH(α)−(1−m)H(β)=Rna(D)
For encoding without feedback (as shown in part (a) of
In this example, the {encoder, decoder} pair satisfying the above condition may be the identity mapping on respective inputs. That is,
a
i
=x
i
,y
i
=b
i
or uncoded transmission is optimal. This result may imply that no encoding is performed and no decoding is performed. Also, a computing device and/or operator may evaluate the minimum excess distortion achievable. For encoding with feedback (shown in part (b) of
a
i
=x
i
⊕b
i−1
, y
i
=b
i
, i=0,1, . . .
Although this case is presented above by way of example, other {encoder, decoder} pairs can be computed using precisely the same methodology (e.g., by invoking the examples of optimal channel input distributions for channels with memory and feedback and transmission cost, and by using the expression of the finite time nonanticipative RDF).
The computing device 1650 may include an assortment of computer-readable media. Computer-readable media may be any media that may be accessed by the computing device 1650. By way of example, and not limitation, the computer-readable media may include both volatile and nonvolatile media, removable and non-removable media. Media may also include computer storage media and communication media. The computer-readable media may store information such as computer-readable instructions, program modules, data structures, or other data. Computer-storage media may include non-transitory media, such as a RAM 1652b, a ROM 1652a, EEPROM, optical storage disks, magnetic storage devices, and any other non-transitory medium which may be used to store computer-accessible information.
In an embodiment, the ROM 1652a and/or the RAM 1652b may store instructions that are executable by the processing unit 1651. For example, a basic input/output system (BIOS), containing algorithms to transfer information between components within the computer 1650, may be stored in ROM 1652b. Data or program modules that are immediately accessible or are presently in use by the processing unit 1651 may be stored in RAM 1652a. Data normally stored in RAM 1652a while the computing device 1650 is in operation may include an operating system, application programs, program modules, and program data. In particular, the RAM 1652a may store one or more applications 1660 including one or more routines 1662, 1664, and 1666 implementing the functionality of the example methods 300, 400, 500, 800, 900, and 1200.
The computing device 1650 may also include other storage media such as a hard disk drive that may read from or write to non-removable, non-volatile magnetic media, a magnetic disk drive that reads from or writes to a removable, non-volatile magnetic disk, and an optical disk drive that reads from or writes to a removable, nonvolatile optical disk. Other storage media that may be used includes magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, and solid state ROM. The hard disk drive may be connected to the system bus 1654 through a non-removable memory interface such as interface 1674. A magnetic disk drive and optical disk drive may be connected to the system bus 1654 by a removable memory interface, such as interface 1690.
A user or operator may interact with the computing device 1650 through input devices such as a keyboard or a pointing device (i.e., a mouse). A user input interface 1702 may be coupled to the system bus 1654 to allow the input devices to communicate with the processing unit 1651. A display device such as a monitor 1722 may also be connected to the system bus 1654 via a video interface (not shown).
The computing device 1650 may operate in a networked environment using logical connections to one or more remote computing devices, for example. The remote computing device may be a personal computer (PC), a server, a router, or other common network node. The remote computing device typically includes many or all of the previously-described elements regarding the computing device 1650. Logical connections between the computing device 1650 and one or more remote computing devices may include a wide area network (WAN). A typical WAN is the Internet. When used in a WAN, the computing device 1650 may include a modem or other means for establishing communications over the WAN. The modem may be connected to the system bus 1654 via the network interface 1725, or other mechanism. In a networked environment, program modules depicted relative to the computing device 1650, may be stored in the remote memory storage device. As may be appreciated, other means of establishing a communications link between the computing device 1650 and a remote computing device may be used.
Upon reading this disclosure, those of ordinary skill in the art will appreciate still additional alternative structural and functional designs for characterizing channels and capacities, determining optimal input distributions, designing encoders and decoders, etc. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
The particular features, structures, or characteristics of any specific embodiment may be combined in any suitable manner and in any suitable combination with one or more other embodiments, including the use of selected features without corresponding use of other features. In addition, many modifications may be made to adapt a particular application, situation or material to the essential scope and spirit of the present invention. It is to be understood that other variations and modifications of the embodiments of the present invention described and illustrated herein are possible in light of the teachings herein and are to be considered part of the spirit and scope of the present invention. By way of example, and not limitation, the present disclosure contemplates at least the following aspects:
1. A method for characterizing a capacity of a channel with memory and feedback, the method comprising:
defining a channel model corresponding to the channel, wherein:
the channel is utilized to transmit information from a source to a destination, and
the channel model indicates a dependence of outputs from the channel on past and present channel input symbols and on past channel output symbols;
determining a representation of the capacity based on the channel model and based on a channel input distribution that achieves the capacity, wherein the representation represents the capacity for a finite number of transmissions over the channel, and wherein the representation includes an optimization; and
solving, by one or more processors of a specially configured computing device, the optimization of the representation of the capacity to determine the capacity of the channel for the finite number of transmissions.
2. The method of aspect 1, wherein determining the representation of the capacity includes:
determining, using stochastic optimal control techniques, a subset of distributions which include the channel input distribution that achieves the capacity.
3. The method of aspect 2, wherein the subset of distributions is a first subset of distributions, and wherein determining the representation of the capacity further includes:
determining, based on the first subset of distributions and using a variational equality of conditional mutual information or mutual information, an upper bound to identify a second subset of distributions which includes the channel input distribution that achieves the capacity, wherein the second subset of distributions is smaller than the first subset of distributions.
4. The method of any one of aspects 1 to 3, further comprising:
defining a transmission cost function, wherein the transmission cost function specifies a cost to transmit the information from the source to the destination and indicates a dependence on at least one of the past and present channel input symbols or the past channel output symbols,
wherein determining the representation of the capacity includes determining the representation of the capacity based on the channel model and based on the transmission cost function.
5. The method of aspect 4, wherein the subset of distributions is a first subset of distributions, wherein determining the representation of the capacity further includes:
determining, using stochastic optimal control techniques, a subset of distributions which include the channel input distribution that achieves the capacity,
determining, based on the channel model and the transmission cost function, if an output of the channel or the transmission cost function is dependent on quantities other than the past channel input symbols and the past channel output symbols,
if the output of the channel or the transmission cost function is dependent on quantities other than the past channel input symbols and the past channel output symbols,
determining, based on the first subset of distributions and using a variational equality of conditional mutual information, a second subset of distributions which include the optimal channel input distribution, wherein the second subset of distributions is smaller than the first subset of distributions, and
determining the representation of the capacity based on the second subset of distributions, if the output of the channel or the transmission cost function is not dependent on quantities other than the past channel input symbols and the past channel output symbols, determining the representation of the capacity based on the first subset of distributions
6. The method of any one of aspects 1 to 5, wherein the representation of the capacity is a first representation of a finite-block length capacity, the method further comprising:
determining, based on the first representation of the finite-block length capacity, determining a second representation of the capacity, wherein the second representation is an upper bound on the first representation and represents the capacity for an infinite number of transmissions over the channel per unit time.
7. The method of any one of aspects 1 to 6, wherein the optimization is a maximization, and wherein solving the optimization of the representation of the capacity including solving the maximization.
8. The method of any one of aspects 1 to 7, wherein solving the optimization includes solving the optimization using a dynamic programming algorithm.
9. The method of any one of aspects 1 to 8, wherein solving the optimization includes solving the optimization using a Blahut-Arimoto algorithm sequentially.
10. The method of any one of aspects 1 to 9, wherein solving the optimization includes:
determining a gain or reduction in the capacity for encoding with feedback as compared to encoding without feedback.
11. The method of any one of aspects 1 to 10, further comprising:
designing, by an encoder design procedure, a coding scheme based on the capacity of the channel determined by solving the optimization of the representation of the capacity, wherein the coding scheme utilizes the channel input distribution that achieves the capacity, and wherein the coding scheme satisfies a condition specifying that none of the information is lost in the coding scheme.
12. The method of aspect 11, further comprising:
configuring an encoder coupled to the channel to encode based on the coding scheme that satisfies the condition specifying that none of the information is lost in the coding scheme.
13. A system including:
one or more processors; and
one or more non-transitory memories,
wherein the one or more non-transitory memories store computer-readable instructions that specifically configure the system such that, when executed by the one or more processors, the computer-readable instructions cause the system to:
receive a channel model corresponding to the channel, wherein:
the channel is utilized to transmit information from a source to a destination, and
the channel model indicates a dependence of outputs from the channel on past and present channel input symbols and on past channel output symbols;
determine a representation of the capacity based on the channel model and based on a channel input distribution that achieves the capacity, wherein the representation represents the capacity for a finite number of transmissions over the channel, and wherein the representation includes an optimization; and
solve the optimization of the representation of the capacity to determine the capacity of the channel for the finite number of transmissions.
14. The system of aspect 13, wherein the computer-readable instructions further cause the system to:
determine a nonanticipative rate distortion function based on a received model of the source and based on a causal reproduction distribution, wherein the second representation and the third representation specify a rate at which symbols from the source should be transmitted to the destination.
15. The system of aspect 14, wherein the computer-readable instructions further cause the system to:
design a coding scheme based on the capacity of the channel determined by solving the optimization of the representation of the capacity and based on the nonanticipative rate distortion function, wherein the coding scheme utilizes the channel input distribution that achieves the capacity, and wherein the coding scheme satisfies a condition specifying that none of the information is lost in the coding scheme,
configure an encoder coupled to the channel to encode based on the coding scheme, wherein the coding scheme simultaneously compresses and encodes information transmitted over the channel.
16. The system of aspect 15, wherein designing the coding scheme includes designing the coding scheme by joint source channel coding.
17. The system of any one of aspects 13 to 16, wherein determining the representation of the capacity includes:
determining, using stochastic optimal control techniques, a subset of distributions which include the channel input distribution that achieves the capacity.
18. The system of aspect 17, wherein the subset of distributions is a first subset of distributions, and wherein determining the representation of the capacity further includes:
determining, based on the first subset of distributions and using a variational equality of conditional mutual information or mutual information, an upper bound to identify a second subset of distributions which includes the channel input distribution that achieves the capacity, wherein the second subset of distributions is smaller than the first subset of distributions.
19. The system of any one of aspects 13 to 18, wherein the representation of the capacity is a first representation of a finite-block length capacity, and wherein the computer-readable instructions further cause the system to:
determine, based on the first representation of the finite-block length capacity, determining a second representation of the capacity, wherein the second representation is an upper bound on the first representation and represents the capacity for an infinite number of transmissions over the channel per unit time.
20. The system of any one of aspects 13 to 19, wherein the channel model is a probabilistic map.
21. The system of any one of aspects 13 to 20, wherein the channel model is a function of the past and present channel input symbols.
This application claims priority to and the benefit of the filing date of U.S. Provisional Patent Application Ser. No. 61/987,797, which was entitled “Real-Time Capacity Achieving Encoder Design For Channels With Memory And Feedback” and filed on May 2, 2014. The entire disclosure of this application is hereby expressly incorporated by reference herein for all uses and purposes.
Number | Date | Country | |
---|---|---|---|
61987797 | May 2014 | US |