The present invention belongs to the field of wireless communication technologies, and in particular, to an intention-driven reinforcement learning-based path planning method.
With the development of the Internet of Things (IoT), wireless sensor networks are widely used for monitoring the surrounding environment, for example, for air pollution, marine resource detection, and disaster warning. IoT sensors generally have limited energy and limited transmission ranges. Therefore, data collectors are required to collect sensor data and further forward or process the sensor data. In recent years, as automatic control systems become increasingly intelligent and reliable, intelligent devices such as unmanned aerial vehicles (UAVs), unmanned ships, and unmanned submarines have been deployed in military and civilian applications for difficult or tedious tasks in dangerous or inaccessible environments.
Although the UAV, the unmanned ship, and the unmanned submarine can more easily complete data collection for monitoring networks as data collectors, they face the key challenge of limited energy. After departing from the base, the data collector needs to move toward sensor nodes, avoid collision with surrounding obstacles and sensor nodes, and return to the base within a specified time to avoid energy depletion. Therefore, a proper trajectory path is required to be designed for the data collector according to intentions of the data collector and sensor nodes, to improve the data collection efficiency of the monitoring network.
Most existing data collection path planning solutions take the intentions of the data collector and the sensor nodes into consideration separately without adjusting a data collection path according to the different intentions of the data collector and the sensor nodes. Moreover, the existing path planning methods do not take into consideration dynamic obstacles that appear and move randomly in the monitoring environment. Therefore, the existing path planning methods have low collection efficiency and reliability.
In order to solve the above technical problems, the present invention provides an intention-driven reinforcement learning-based path planning method. In the method, intentions of a data collector and sensor nodes are expressed as rewards and penalties according to the real-time changing monitoring network environment, and a path of the data collector is planned through a Q-learning reinforcement learning method, so as to improve the efficiency and reliability of data collection.
An intention-driven reinforcement learning-based path planning method includes the following steps:
Further, the state S of the monitoring network in step A includes: a direction of sailing φ[n] of the data collector in a time slot n, coordinates qu[n] of the data collector, available storage space {bam[n]}m∈M of the sensor nodes, data collection indicators {wm[n]}m∈M of the sensor nodes, distances {dum[n]}m∈M between the data collector and the sensor nodes, and {duk[n]}k∈K distances between the data collector and the surrounding obstacles, where M is the set of sensor nodes, K is the set of surrounding obstacles, wm[n]∈{0,1} is a data collection indicator of the sensor node m, and wm[n]=1 indicates that the data collector completes the data collection of the sensor node m in the time slot n, or otherwise indicates that the data collection is not completed.
Further, a formula for calculating the steering angle of the data collector in step B is:
Further, steps of determining the target position in step B include:
Further, a method for selecting the action according to the ε greedy policy in step C is expressed as:
Further, a formula for calculating the position in the next time slot of the data collector in step D is:
Further, the rewards and penalties corresponding to the intentions of the data collector and the sensor nodes in step E are calculated as defined below:
Further, a formula for updating the Q value in step E is:
Further, the termination state of the monitoring network in step F is that the data collector completes the data collection of the sensor nodes or the data collector does not complete the data collection at a time T, and the convergence condition of the Q-learning is expressed as:
Further, the intention-driven reinforcement learning-based path planning method is applicable to unmanned aerial vehicles (UAVs)-assisted ground Internet of Things (IoT), unmanned ships-assisted ocean monitoring networks, and unmanned submarines-assisted seabed sensor networks.
An intention-driven reinforcement learning-based path planning method of the present invention has the following advantages:
Considering the intentions of the data collector and the sensor nodes, a data collection path planning method with coverage of all nodes is designed according to the random dynamic obstacles and the real-time sensed data in the monitoring environment. The Q-learning model optimizes the real-time coordinates of the data collector according to the current state information of the monitoring network, minimizes the intention difference, and improves the efficiency and reliability of data collection.
The following specifically describes an intention-driven reinforcement learning-based path planning method provided in the embodiments of the present invention with reference to the accompanying drawings.
A marine monitoring network includes one unmanned ship, M sensor nodes, and K obstacles such as sea islands, sea waves, and reefs. The unmanned ship sets out from a base, avoids collision with the obstacles and the sensor nodes, completes data collection of each sensor node within a specified time T, and returns to the base. In order to satisfy intentions of the unmanned ship and the sensor nodes, weighted energy consumption of the unmanned ship and data overflow of the sensor nodes are expressed as rewards for reinforcement learning, and a safety intention, a traversal collection intention, and an intention of returning to the base on time are expressed as penalties, to optimize a path of the unmanned ship by using a Q-learning method.
The rewards and penalties are calculated as defined below:
The intention-driven reinforcement learning-based path planning method of the present invention is applicable to unmanned aerial vehicles (UAVs)-assisted ground Internet of Things (IoT), unmanned ships-assisted ocean monitoring networks, and unmanned submarines-assisted seabed sensor networks.
It may be understood that the present invention is described with reference to some embodiments, a person skilled in the art appreciate that various changes or equivalent replacements may be made to the embodiments of the present invention without departing from the spirit and scope of the present invention. In addition, with the teachings of the present invention, these features and embodiments may be modified to suit specific situations and materials without departing from the spirit and scope of the present invention. Therefore, the present invention is not limited by the specific embodiments disclosed below, all embodiments falling within the claims of this application shall fall within the protection scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202111208888.4 | Oct 2021 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/137549 | 12/13/2021 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2023/065494 | 4/27/2023 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
11175663 | Johnson | Nov 2021 | B1 |
20070269077 | Neff et al. | Nov 2007 | A1 |
20200166928 | Sudarsan | May 2020 | A1 |
20210182718 | Li | Jun 2021 | A1 |
20220139078 | Murakoshi | May 2022 | A1 |
20220204055 | Watterson | Jun 2022 | A1 |
Number | Date | Country |
---|---|---|
111515932 | Aug 2020 | CN |
112672307 | Apr 2021 | CN |
112866911 | May 2021 | CN |
113190039 | Jul 2021 | CN |
Entry |
---|
Machine Translation of CN-113190039 retrieved from Clarivate Analytics May 2024 (Year: 2024). |
Machine Translation of CN-112866911 retrieved from Clarivate Analytics May 2024 (Year: 2024). |
Wang et al., “UAV-assisted Cluster Head Election for a UAV-based Wireless Sensor Network,” 2018 IEEE 6th International Conference on Future Internet of Things and Cloud, 2018, pp. 267-274, 8 pages. |
Number | Date | Country | |
---|---|---|---|
20240219923 A1 | Jul 2024 | US |