The invention relates to a network energy efficiency optimization method, in particular to an energy efficiency optimization method for an IRS-assisted NOMA THz network, and belongs to the technical field of communication.
The demand for an ultra-high data rate of information and entertainment grows rapidly in current and future wireless communication. However, the available spectrum resources are far from supporting the increasing data rate, which makes it urgent to explore new broadband to break through the spectrum bottleneck. Therefore, Terahertz (THz) band attracted wide attentions of the academic and industrial communities with its broadband characteristics, and considered as the basic technology of the sixth-generation (6G) mobile communications. THz wave refers to the frequency of 0.1-10 THz and its available bandwidth is more than tens time of the millimeter wave. Its peak data rate is expected to 1-10 TBits/s. Owing to the advantages of narrow beam and large communication capacity, THz band provides more potentials to achieve ultra-high wireless transmission rate. However, due to the high frequency and small wavelength, the diffraction and penetration ability of THz wave is worse than microwave and millimeter wave, which makes it easier to be blocked by obstacles.
Due to the intense attenuation performance, THz band is only suitable for short-distance communication scenarios, such as shopping malls, subway stations and other indoor places. The THz applications in outdoor communications require a lot of relay equipment. Therefore, some scholars propose to combine THz technology with intelligent reflecting surface (IRS) to make the transmission more efficient. IRS is a kind of reflecting surface composed of a large number of passive reflecting components. Each component can adjust its angle to reflect the signal independently. The intelligent reflector can be placed on the surface of the buildings, which effectively reflects the indoor and outdoor signals. Many studies have focused on the IRS-assisted communication by THz band.
THz wave has wide bandwidth with more latent users and equipments such as mobile users, industrial users and intelligent health-care terminals. However, the THz band has a major defect of small coverage, which is caused by severe attenuation of THz signals. This defect will lead to a heavy transmission burden and result in a rapid increase of energy consumption. Non-orthogonal multiple access (NOMA) is a promising wireless communication technique, which allows users to share the same sub-channel simultaneously and their communication resources through a power domain or a code domain. Compared with traditional orthogonal multiple access, NOMA is an effective technique for improving spectral efficiency and realizing mass wireless network connection [8]. NOMA encourages more user devices to share the same sub-channel and can provide many data services to increase the utilization rate of resources in a THz network. In order to realize mass wireless connection and increase the resource utilization rate in THz communication, NOMA is applied to the THz network in recent studies. By introducing NOMA to a THz cellular network, a sub-channel and power allocation approach based on an alternating direction method is put forward to optimize energy efficiency.
Inspired by the capacity enhancement of NOMA and the talent coverage improvement of IRS, the combination of NOMA with IRS-assisted communications has aroused significant interests. For example, in some researches, a design of IRS-assisted NOMA downlink transmission was proposed, wherein channel vectors of marginal users are aligned in a preset spatial direction with the aid of IRSs. Some researches put forward an IRS-aided NOMA network, and proposed an energy-efficient scheme to jointly optimize the transmission beamforming of the BS and the reflection phase shift of the IRS. In addition, some researches think out IRS-enhanced millimeter-wave NOMA systems and come up with joint optimization of beam formation and power allocation.
The resource management mechanism of traditional networks has been relatively mature, but when applied to THz networks, it still has many limitations, which mainly include:
The technical issue to be settled by the invention is to provide an energy efficiency optimization method for an IRS-assisted NOMA THz network to overcome the defects of the prior art.
The technical solution adopted by the invention to settle the above technical issue is as follows:
An energy efficiency optimization method for an IRS-assisted NOMA THz network comprises the following steps:
Further, in Step 1,
NB antennas are configured for a base station, NU antennas are configured for users, and the users are classified into BS users and IRS users; assume the number of the BS users is L, the BS users is represented by a set ={1, 2, . . . , L}; the IRS users are divided into M clusters, wherein each cluster comprises K users and is served by G IRS elements, and ={1, 2, . . . M} ={1, 2, . . . G}, ={1, 2, . . . , K)} a bandwidth of the system is divided into multiple sub-channels, wherein each BS user and each IRS user respectively use one sub-channel, and assume the BS users use the first L sub-channels, IRS users use the remaining sub-channels.
Further, in Step 2, the channel model for the BS users is specifically as follows:
Considering that a THz channel from a BS to users is modeled into a LoS path with the neglect of the reflected, scattered and diffracted fading due to severe attenuation of THz; a channel gain from the BS to a user l at a sub-channel n is expressed as:
Wherein, PL(fn, dl) is a path loss of the THz LoS path, and Fn and dl are a THz frequency and a distance between the BS and the user; the path loss of the THz LoS path is formed by two parts, of which one is a free space spreading loss and the other is a molecular absorption loss, with an expression as:
PL(fn,dl)=Lspread(fn,dl)×Labs(fn,dl)
Where, Lspread(fn, dl) and Labs(fn, dl) meet:
Where, c represents a speed of light, and kabs(fn) represents molecular absorption coefficient;
Assume power transmitted to the user l through the sub-channel n is Pl,nB, a received signal is:
Where, σ2 is additive white Gaussian noise power, and pl′,n′B is power transmitted to a user l′ through the sub-channel n.
Further, in Step 2, the channel model for the IRS users is specifically as follows:
A channel for the IRS users is composed of a channel from the BS to an IRS, a channel from the IRS to the users, and a phase shift of IRS elements; according to a classical S-V model, assume a channel vector reflected by an IRS i to a kth user in an mth cluster is defined as:
H=H
I
ΦH
B
Wherein, HB represents channel attenuation from the BS to the IRS, HI represents channel attenuation from the IRS to the users; Φ is a G×G diagonal matrix, represents the phase shift of the IRS elements and meets Φ=diag([ejφ
H
B
=A
I
diag(α)AB*
Wherein
α=√{square root over (NBG/L1)}[α1, . . . ,αl
A
B=[αB(ϕ1), . . . ,αB(ϕl
A
l
=[αl
Wherein, L1 represents the number of scattering paths from the BS to the IRS, αl
Where, λ is a wavelength of THz signals, and d is a distance between adjacent antenna elements or IRS elements;
Similar to a BS-IRS link, a IRS-user channel is formulated as:
H
I
=A
Udiag(β)AI
Wherein,
β=NUG/L2[β1, . . . ,βl
A
U=[αU(ψ1), . . . ,αU(ψl
A
I
=[αI
Wherein, L2 represent the number of scattering paths from the BS to the IRS, ψl
So, IRS-user channel is:
H=A
Udiag(β)AI
For the sake of brevity, assume NB=1 and NU=1, the vector H is composed of a vector representing a channel gain hi,m,k,nI of the kth user in the mth cluster on the sub-channel pi,m,k,nI represents power transmitted to the kth user in the mth cluster on the sub-channel n; a signal received by the kth user in the mth cluster on the sub-channel n is expressed as:
Further, in Step 3,
The BS user rate is calculated:
Wherein, a signal to noise ratio for signal reception of a BS user l is:
By a Shannon equation, a rate of the user l is expressed as:
Where, B is a bandwidth,
The IRS user rate is calculated:
Wherein, a signal to noise ratio of the kth user in the mth cluster is:
The rate is expressed as:
R
i,m,k
I
=Bπ
n=1
L+IMK log2(1+SINRi,m,k,nI)
The total rate of the system is expressed as:
R=Σ
l=1
L
R
l
B+Σi=1IΣm=1MΣk=1KRi,m,kI
Further, in Step 4,
To maximize overall energy efficiency of a network, the optimization problem for downlink power control and the IRS phase shift adjustment is proposed, wherein total transmission power of the BS is calculated as the sum power of all the users by:
The energy efficiency of the system network is defined as a ratio of a sum rate to the total power of the network, and the optimization problem is formulated as:
Wherein, C1 and C2 are power limitations of each user, C3 and C4 are minimum rate requirements, and C5 is an angle range.
Further, in Step 5,
The optimization problem is solved through the MADRL method: virtual agents are introduced into the BS as mappings of the users, and the virtual agents perform training to obtain optimal power and phase shift; a central control unit is configured on the BS to collect user information including channel state information (CSI), phase shift and power; a clock is set to ensure synchronous iteration during agent training, so that overall energy efficiency is calculated after each iteration; and the agents perform training according to the collected user information and real-time iteration results to realize global optimization.
Further, a Markov process taking a discrete time, a finite state space and an action space into account is used for training; basic elements of reinforcement learning are represented by a tuple (, , , ) where represents a state space, represents an action space, represents a reward function, and represents a state transition probability; and the state space and the action space are set as follows:
Wherein, φmin and φmax are a minimum phase and a maximum phase of the IRS elements, Pmin and Pmax are minimum user power and maximum user power, and a discrete quantity of the angle and a discrete quantity of the power are |φ| and |P| respectively; the action space is formed as ={a|a=(φ, p)}.
An optimal strategy π is obtained by the agents to realize a maximum cumulative reward, which is obtained by:
Where, γ∈(0,1] is a discount factor for future rewards;
During training, the agents select an action according to the optimal strategy π; at the state st, the agents take an action αt according to the optimal strategy π, and at this moment, an action-value function Qπ(st, at) at of the agents is expressed as:
Qπ(st,at)Eπ[Rt|s=st,a=at]
According to a Bellman equation,
An evaluation of the optimal strategy is expressed as:
The optimal strategy is obtained by:
To search an optimal strategy in a large state space and a large action space, a DQN is introduced into MADRL; the optimal strategy and the value function are approximated as function according to Qi(s, α; θ)≈Q*(s, α), where θ is a weight and is updated by training; the DQN comprises a target network and a current network, which are trained by minimizing a loss function to optimize the parameter θ; the loss function is:
loss(θ)=(ytDQN−Qt(st,αt;θ))2
Wherein, Qt(st, αt; θ) is an output of the neural network with the parameter is θ at the state st, and ytDQN is an output of the target network with the parameter is {circumflex over (θ)} at the state st+1;
The loss function is minimized through a gradient descent algorithm, and the action-value function is approximated by the neural network until convergence.
Further, an action-value function Q is, a target action-value function {circumflex over (θ)}=θ, an index T for iteration and an experience pool are generated according to the random parameter θ;
For episode=1 to M do
Compared with the prior art, the invention has the following advantages and effects:
To expound in detail the technical solutions adopted by the invention to fulfill desired technical purposes, the technical solutions of the embodiments of the invention will be clearly and completely described below in conjunction with the drawings of the embodiments of the invention. Obviously, the embodiments in the following description are merely illustrative ones, and are not all possible ones of the invention, and the technical means or technical features in the embodiments of the invention can be substituted without creative labor. The invention will be described in detail below with reference to the accompanying drawings and embodiments.
First of all, part of specialized vocabularies used in the invention will be explain:
Application of NOMA to THz communication: in order to realize mass wireless connection and increase the resource utilization rate in THz communication, NOMA s applied to THz networks in recent study. In Literature [5], NOMA is applied to THz cellular networks, and a sub-channel and power allocation scheme based on an alternating direction method is proposed to optimize energy efficiency. In addition, in Literature [4], a long-term use-center window property of THz is captured, and a central sub-band and side sub-bands of a THz window are allocated to long and short NOMA groups respectively. In NOMA, power allocated to users is related to the channel gain. Small channel gains will be allocated to high-power users, and large channel gains will be allocated to low-power users [4]. NOMA can decode or demodulate superposed signals under coverage.
IRS-aided NOMA network: Under the enlightenment of capacity enhancement of NOMA and coverage increase of IRSs, IRS-aided NOMA communication has aroused the interest of researchers. Literature [6] proposes a design of IRS-aided NOMA downlink transmission, wherein channel vectors of marginal users are aligned in a preset spatial direction with the aid of IRSs. In Literature [7], the author emphatically studies an IRS-aided NOMA network and puts forward an energy-saving scheme based on joint optimization of emitted wave beam formation of a base station (BS) and reflecting phase shift of IRSs. In addition, Literature [8] proposes IRS-enhanced millimeter-wave NOMA systems and comes up with joint optimization of beam formation and power allocation. In Literature [5], the author focuses on an IRS-aided NOMA network and puts forward an energy-saving algorithm, which maximizes the energy efficiency of a system by joint optimization of the transmitted beam formation of a BS and reflecting phase shift of IRSs. Literature [6] studies an IRS-enhanced millimeter wave NOMA system and puts forward joint optimization of active beam formation, passive beam formation and power distribution. In Literature [7], the validity of IRSs in transmission power of NOMA systems is studied, and considering the constraints of minimum signal-to-interference ratio of each user, the problem of power minimization of an IRS-aided downlink NOMA system is proposed. In Literature [8], a simple design of IRS-assisted NOMA downlink transmission is put forward. The base station generates orthogonal wave beams in the spatial direction of a channel near to users by means of traditional space division multiple access; and with the aid of IRSs, valid channel vectors of marginal users are aligned in a preset spatial direction to ensure that these wave beams can serve extra marginal users.
Introduction of reinforcement learning: Literature [9]-[11] solve an optimization problem by means of reinforcement learning. Literature [9] studies a method for power allocation in multi-cell networks, which, different from traditional optimization methods, uses deep reinforcement learning (DRL) for power allocation. The objective of this article is to maximize the overall capacity of a whole network under the condition of random and dense distribution of base stations. A wireless resource mapping method and a deep neural network Deep Q-fully-connected network (DQFCNet) are provided. Compared with power allocation based on water-filling and the Q learning method, the DQFCNet can realize a higher overall capacity. Simulation results indicate that the convergence rate and stability of the DQFCNet are remarkably improved. Literature [9] solves the problem of dynamic spectrum access by means of DRL. Specifically, in this article, a scene where different types of nodes share multiple discrete channels is studied, these nodes do not have the capacity to communicate with other nodes and have no prior knowledge about the behaviors of other nodes. The objective of each node is to maximize the long-term transmission succeed rate of its own. This problem is expressed as a Markov decision process (MDP) with unknown system dynamics. To overcome the challenge of an unknown environment and a large transition matrix, two specific DRL methods are used: deep Q network (DQN) and double deep Q network (DDQN). In addition, improved DQN techniques, including qualification tracking, prior experience and “prediction process”, are introduced. Simulation results indicate that the DQN and the DDQN can effectively learn communication modes of different nodes without prior knowledge and fulfill approximately the optimal performance. Literature [11] points out that complete system observability is necessary for optimizing radio transmission power and user data rate in wireless systems. Although this issue has been widely studied in this literature, there is still no practical solution for approaching the optimal performance merely by means of the observability of available parts of an actual system. The invention provides a reinforcement learning method for realizing downlink power control and rate adaptation in a cellular network to overcome this defect. The invention puts forward the design of a comprehensive learning framework, including a system state, a common reward function and an effective learning algorithm. System-level simulation results show that this design learns a power control strategy rapidly, fulfills a remarkable energy-saving effect and guarantees the fairness of users in a system.
As shown in
Step 1: users are classified into BS users and IRS users.
NB antennas are configured for a base station, NU antennas are configured for users, and the users are classified into BS users and IRS users; assume the number of the BS users is L, the BS users are represented by a set ={1, 2, . . . , L}; the IRS users are divided into M clusters, wherein each cluster comprises K users and is served by G IRS elements, and ={1, 2, . . . , M} ={1, 2, . . . G}, ={1, 2, . . . , K}; a bandwidth of the system is divided into multiple sub-channels, wherein each BS user and each IRS user respectively use one sub-channel, and assume the BS users use the first L sub-channels, IRS users use the remaining sub-channels.
Step 2: a channel model for the BS users and a channel model for the IRS users are defined.
The channel model for the BS users is specifically as follows:
Considering that a THz channel from a BS to users is modeled into a LoS path with the neglect of the reflected, scattered and diffracted fading due to severe attenuation of THz; a channel gain from the BS to a user l at a sub-channel n is expressed as:
Wherein, PL(fn, dl) is a path loss of the THz LoS path, and fn and dl are a THz frequency and a distance between the BS and the user; the path loss of the THz LoS path is formed by two parts, of which one is a free space spreading loss and the other is a molecular absorption loss, with an expression as:
PL(fn,dl)=Lspread(fn,dl)×Labs(fn,dl)
Where, Lspread(fn, dl) and Labs(fn, dl) meet
Where, c represents a speed of light, and kabs(fn) represents molecular absorption coefficient;
Assume power transmitted to the user l through the sub-channel n is Pl, nB, a received signal is:
Where, σ2 is additive white Gaussian noise power, and pl′,n′B is power transmitted to a user l′ through the sub-channel n.
The channel model for the IRS users is specifically as follows:
H=H
I
ΦH
B
Where, HB represents channel attenuation from the BS to the IRS, HI represents channel attenuation from the IRS to the users; Φ is a G×G diagonal matrix, represents the phase shift of the IRS elements and meets Φ=diag([ejφ
H
B
=A
I
diag(α)AB*
Wherein
α=√{square root over (NBG/L1)}[α1, . . . ,αl
A
B=[αB(ϕ1), . . . ,αB(ϕl
A
I
=[αl
Where, L1 represents the number of scattering paths from the BS to the IRS, αl
Where, λ is a wavelength of THz signals, and d is a distance between adjacent antenna elements or IRS elements;
Similar to a BS-IRS link, a IRS-user channel is formulated as:
H
I
=A
Udiag(β)AI
Wherein,
β=√{square root over (NUG/L2)}[β1, . . . ,βl
A
U=[αU(ψ1), . . . ,αU(ψl
A
I
=[αI
Where, L2 represent the number of scattering paths from the BS to the IRS, ψl
So, IRS-user channel is:
H=A
U
diag(β)AI
For the sake of brevity, assume NB=1 and NU=1, the vector H is composed of a vector representing a channel gain hi,m,k,nI of the kth user in the mth cluster on the sub-channel n; Pi,m,k,nI represents power transmitted to the ktn user in the mth cluster on the sub-channel n; a signal received by the kth user in the mth cluster on the sub-channel n is expressed as:
Step 3: a BS user rate and an IRS user rate are calculated respectively, and a total rate of a system is calculated.
The BS user rate is calculated:
Wherein, a signal to noise ratio for signal reception of a BS user l is:
Where, B is a bandwidth;
The IRS user rate is calculated:
Wherein, a signal to noise ratio of the kth user in the mth cluster is:
The rate is expressed as:
R
i,m,k
I
=BΣ
n=1
L+IMK log2(1+SINRi,m,k,nI)
The total rate of the system is expressed as:
R=Σ
l=1
L
R
l
B+Σi=1IΣm=1MΣk=1KRi,m,kI
Step 4: an optimization problem for downlink power control and IRS phase shift adjustment is proposed.
To maximize overall energy efficiency of a network, the optimization problem for downlink power control and IRS phase shift adjustment is proposed, wherein total transmission power of the BS is calculated as the sum power of all the users by: The optimization problem of downlink power control and IRS phase shift modulation is proposed to maximize overall energy efficiency of a network, wherein total transmission power of the BS is a sum of power of all the users and is expressed as:
The energy efficiency of the system network is defined as a ratio of a sum rate to the total power of the network, and the optimization problem is formulated as:
Where, C1 and C2 are power limitations of each user, C3 and C4 are minimum rate requirements, and C5 is an angle range.
Step 5: the optimization problem is solved through an MADRL method.
The optimization problem is solved through the MADRL method: virtual agents are introduced into the BS as mappings of the users and perform training to obtain optimal power and phase shift; a central control unit is configured on the BS to collect user information including channel state information (CSI), phase shift and power; setting a clock to ensure synchronous iteration during agent training, so that overall energy efficiency is calculated after each iteration; and the agents perform training according to the collected user information and real-time iteration results to realize global optimization.
A Markov process taking a discrete time, a finite state space and an action space into account is used for training; basic elements of reinforcement learning are represented by a tuple (, , , ), where represents a state space, represents an action space, represents a reward function, and represents a state transition probability; and the state space and the action space are set as follows:
Wherein, φmin and φmax are a minimum phase and a maximum phase of the IRS elements, Pmin and Pmax are minimum user power and maximum user power, and a discrete quantity of the angle and a discrete quantity of the power are |φ| and |P| respectively; the action space is formed as ={a|a=(φ, p)};
3) Reward space: a difference between the overall energy efficiency in a current state and the overall energy efficiency in a previous state is defined as a reward, which is presented as:
Where, γ∈(0,1] is a discount factor for future rewards;
During training, the agents select an action according to the optimal strategy π; at the state st, the agents take an action αt according to the optimal strategy π, and at this moment, an action-value function Qπ(st, αt) of the agents is expressed as:
Q
π(st,αt)Eπ[Rt|s=st,α=αt]
According to a Bellman equation,
An evaluation of the optimal strategy is expressed as:
The optimal strategy is obtained by:
To search an optimal strategy in a large state space and a large action space, a DQN is introduced into MADRL; the optimal strategy and the value function are approximated as a function according to Qi(s, α; θ)≠Q*(s, α), where θ is a weight and is updated by training; the DQN comprises a target network and a current network, which are trained by minimizing a loss function to optimize the parameter θ; the loss function is:
loss(θ)=(ytDQN−Qt(st,αt;θ))2
Wherein, Qt(st, αt; θ) is an output of the neural network with the parameter is θ at the state st, and ytDQN is an output of the target network with the parameter is {circumflex over (θ)} at the state st+1;
The loss function is minimized through a gradient descent algorithm, and the action-value function is approximated by the neural network until convergence.
An action-value function Q is, a target action-value function {circumflex over (θ)}=θ, an index T for iteration and an experience pool are generated according to the random parameter θ;
For episode=1 to M
The above embodiments are merely preferred ones of the invention, and are not intended to limit the invention in any form. Although the invention has been disclosed above with reference to the preferred embodiments, these embodiments are not used to limit the invention. Any skilled in the art can obtain equivalent embodiments by slightly changing or modifying the technical contents disclosed above without departing from the scope of the technical solutions of the invention. Any simple amendments, equivalent substitutions and improvements made to the above embodiments based on the spirit and principle of the invention according to the technical essence of the invention should still fall within the protection scope of the technical solutions of the invention.
Literature list in this application:
Number | Date | Country | Kind |
---|---|---|---|
202210680248.1 | Jun 2022 | CN | national |