The following relates to autonomous sensing vehicles and more specifically relates to energy harvesting methods of such autonomous sensing vehicles.
Autonomous sensing vehicles (ASV) such as drones, unmanned aerial vehicles, controlled balloons (i.e. hot air balloons), remotely operated vehicles and remotely operated underwater vehicles are vehicles that are typically unoccupied, usually highly maneuverable, and can be operated remotely by a user proximate to the vehicle or can be operated autonomously. Autonomously operated vehicles do not require a user to operate them. Autonomous vehicles may have the potential to greatly improve both the range and endurance of unmanned vehicles. Autonomous sensing vehicles may be used for a number of uses including, but not limited to remote sensing, commercial surveillance, filmmaking, disaster relief, geological exploration, agriculture, rescue operations, and the like. It can be noted that it would be ideal to increase the operation time and endurance of ASV's for these and other uses. Autonomous sensing vehicles may contain a plethora of sensors which can include, but are not limited to accelerometers, altimeters, barometers, gyroscope, thermal cameras, cameras, LiDAR (Light Detection and Ranging) sensors, etc. These sensors may be useful for increasing the operation time and endurance of ASV or may be useful for the uses mentioned above. For instance, a gyroscope can be used for measuring or maintaining orientation and angular velocity of the ASV and may improve the operational time of the ASV; however, a camera may be used to take images during geological exploration.
One of the key constraints on the performance of the ASV can be energy. Energy can have a direct effect on the ASV's endurance, range, and its payload capacity. To manage the energy levels better, an ASV may extract energy from its environment, this is referred to as ‘energy harvesting’ herein. The ASV can use any method of energy harvesting, or a combination of energy harvesting methods to harvest energy to increase endurance and range of the ASV. In one example, underwater ASV's may harvest energy using wave currents. In another example, land ASV's may harvest energy level using solar power. In yet another example, aerial ASV's may harvest energy level using of thermal updrafts and ridge lifts (referred to as ‘soaring’ herein).
Soaring takes advantage of a thermals to increase the flight time of an aerial ASV and has been a studied and experimented in the past two decades. For example, in 2010, Edwards & Silverberg demonstrated soaring against human piloted competitors in the Montague Cross Country Challenge for remote-controlled sailplanes. However, there may be challenges related to soaring.
Some challenges include: sensing (an effective soaring system should be able to sense the surrounding environment and the motion of atmospheric currents); energy harvesting (the aerial ASV should be equipped to make decisions to exploit energy and avoid sinking in air), energy level considerations (i.e. the aerial ASV should be able to consider its energy state and that of the environment as it navigates).
AutoSoar (Depenbusch, Nathan T., John J. Bird, and Jack W. Langelaan. “The AutoSOAR autonomous soaring aircraft part 2: Hardware implementation and flight results.” Journal of Field Robotics 35.4 (2018): 435-458) addresses some of these issues. AutoSoar teaches a method of autonomous soaring using of thermal updrafts and ridge lifts. AutoSoar aims to address all the phases of thermal soaring such as: thermal detection, thermal latching and unlatching, thermal centering control, mapping, exploration, and flight management. AutoSoar aims to teach a method of increasing the flight time by using thermals and ease the search to find these thermals.
However, AutoSoar does not optimize the operational time of an ASV using energy harvesting while simultaneously achieving the ‘sensing’ or ‘observational’ goals of the ASV mission. There remains a need for a method which optimizes/balances the time the ASV spends ‘in observation’ or ‘sensing’ and the time the ASV spends ‘recharging’ or ‘energy harvesting’.
A multi-objective method of optimizing the time the ASV spends in ‘observation’ or ‘sensing’ and the time the ASV spends ‘recharging’ or ‘energy harvesting’ is taught herein. The method comprises: collecting data on observation points of interest, determining whether or not energy harvesting is needed, and effectively visiting the observation points of interest between the search for energy harvesting.
Embodiments will now be described with reference to the appended drawings wherein:
A multi-objective method of optimizing the time the ASV spends ‘in observation’ or ‘sensing’ and the time the ASV spends ‘recharging’ or ‘energy harvesting’ is taught herein. The method comprises: collecting data on observation points of interest, determining whether or not energy harvesting is needed, and effectively visiting the observation points of interest between the search for energy harvesting.
The method taught herein can increase the endurance of a ASV while effectively visiting the observation points of interest. The determination of the balance between energy harvesting, exploration for energy harvesting, and visit the observation points is taught herein. It can be noted that by using different input signals, the ASV is directed to expand its energy levels and operational times while following the observation targets.
An optimized system of ASV observation is taught herein. The system is comprised of an off-board computer software; and local on-board smart system. The off-board computer software program takes the past flight data, weather forecast, mission objectives, and ASV's characteristics. This program then uses this information, to generate a potential map of paths and potential paths. These maps and paths are planned with weather forecast aware system but do not need them to generate the maps.
The local on-board smart system takes the information from the off-board computer, signals from sensors, and autopilot command. It also may (or not) have access to a localized weather system (Third party). This system based can choose the next way point based on the information presented. It uses a Smart Decision Making System to balance between exploration and exploitation of the environment. This system will update the maps of energy sources. For example, this system may choose bank angel or speed of an aircraft to make it behave in a more optimized fashion. In an embodiment, the suggested solution allows an aerial ASV to take advantage of thermals while behaving as expected in the observation missions. In another embodiment, the suggested solution allows an underwater ASV to take advantage of wave currents while behaving as expected in the observation missions.
The local on-board smart system takes the information from the off-board computer, signals from sensors, and autopilot command. It also may (or may not) have access to a localized and global weather system from a third party. This system can choose the next way point based on the information presented. It uses a Smart Decision Making System to balance between exploration and exploitation of the environment. This system will update the maps of energy sources. This system may choose and/or modify the bank angel or speed of an aircraft to make it behave in a more optimized fashion. In an embodiment, the suggested solution allows an aerial ASV to take advantage of thermals while behaving as expected in the observation missions. In another embodiment, the suggested solution allows an underwater ASV to take advantage of wave currents while behaving as expected in the observation missions.
This method enables the endurance of the ASV to increase while the observation goals have been met. This method enables any ASV to effectively carry on their mission and take advantage of the free energy sources available in the atmosphere, (i.e. thermal updrafts, tidal energy, solar energy).
Thus, in one embodiment, using dynamic programming and information available such as the location of observation points, past flight information 102, wind and weather forecast 106, and vehicle energy states 104, the off-board algorithm 108 generates a value function grid 109 of with values function associated with each grid. The system can use this as an input to the on-board computer system that manages the vehicle behavior during operation and determines when changing behavior is appropriate.
Some inputs 111 of the on-board path planner 113 include, but are not limited to: sensor reading 114, energy capabilities of the ASV 104, energy reading 104b, autopilot commands 115, meteorological forecast 106, maps (terrain, land cover, underwater, etc) 105, the output 112 from the off-board planner 108, potential value function map 109, the list of paths having a value associated with them 110, waypoints 116a, type(s) of energy harvesting required 100, and the importance factor for observation 107. The inputs 111 to the on-board path planner 113 may also include end point 101, region(s) of interest 101, no-fly zones, boundaries, past flight data 102, and aircraft parameters 103. In a preferred embodiment, the output of the on-board (local) path planner 113 comprises a map with potential probabilities 117 and/or the list of new waypoints 116.
A variety of methods can be used in the local path planner 113. The methods include: the time-based algorithm; Greedy algorithm decision making; and Smart Decision Making System.
The time-based system can be used to balance the time spent on exploration for energy sources versus going on the mission. After defined time in energy source exploration mode, the system can directly begin observing the nearest mission. For example, an ASV using a time-based system could be on an observation mission for a specified amount of time. If after that time, the ASV is still in observation mode, the system can switch to energy-source exploration for another specified amount of time. The system can switch between exploration mode and observation mode multiple times.
The ASV system can have access to a list of observation points. At a certain time, after completing exploration mode by climbing 118 and decision making 119, the ASV will look for a first observation point, (i.e. the closest observation point 120. The ASV can then decide to go to observe 124 the first observation point 123 and update the observation list 125; or, if the ASV has never explore the area for thermals 126 and use them to energy harvest 130 and update the list of thermals 131 as needed. In one embodiment, the balancing decision comes from a timer on board. If the timer times out during observation mode, the system will cause the ASV to switch to exploration mode 126 for a specified time. In one embodiment, the system can repeat this action till the ASV arrives at an observation point. Once the observation point is observed, the system will move that observation point to the end of the list of observation points and set the next observation point as the next goal. This sequence may be repeated. In another embodiment, if the timer times out during exploration mode, the system will cause the ASV to switch to observation mode for a specified time.
Since the location of the first observation point can be known, a grid map, or, value function map can be created that covers a whole region of interest.
The notion of “how good” 404 here is defined in terms of future rewards that can be expected, or in terms of expected return. Accordingly, value functions are defined with respect to policies. A policy is a mapping from each state, and action, to the probability of taking action when in the state.
The method can further comprise defining a set of possible actions. A special action set is defined by 8 moves possible by the ASV, with all action having the same probability to be chosen. The ASV can move 8 directions from any cell to its neighboring cells. It can be noted that edge cases are limited version of the actions (i.e. can only move 3 directions from [0,0]).
In order to define a value function equation, one can define a state s E S, where s can be a point in a grid size m×n, which represents a geological location. s can store values of weather, probability of energy, presence or absence of observation point. Let us define rewards as:
Note W.P symbolizes “with probability of”. We can then define policy Π(s, a) that assigns a value to probability of each actions at each states. For simplicity, we assume it is a uniform distribution policy from here on, but it may be anything or even be learnt.
Let us define Gt the expected reward at location t
The Value function of each grid points can then be:
We use the Value function to generate the maps of value function grids.
The above-noted steps can also be applied for multiple sources of input, (i.e. with past flight information or wind). In one embodiment, one of the multiple sources of input may include past flight information.
It is important to note that there are many ways of combining the value function map and the probability map information, such as adding a high reward value to the regions of high probability of thermals and low value otherwise. In one embodiment, an importance multiplier can be introduced that balance the rewards associated with observation points and thermal updraft points. The importance multiplier value can be tuned based on different mission where sometimes the exploration of thermal is more important the observing the observation point and vice versa.
The observation points may also be moving. The algorithm can follow any observation point that can be fixed or moving. Moving targets may require an online connection to refresh the map.
The greedy decision making algorithm can balance between the exploration for energy sources and observation mode by a greedy probability.
Once a value function map is defined, the map for the optimal behavior can be defined as follows. Define steps or, actions that the ASV takes to travel 1 or n number of cells. The value function map shows the optimal behavior as the action that can be taken by the ASV from each given cell to the highest value neighboring cell.
The greedy decision making algorithm can then balancing between observation mode and exploration mode. The algorithm can utilize various maps to decide the behavior of the ASV such as exploration and observation modes. In one embodiment, the system will choose to visit the highest value neighboring cell as this is the cell that defines a path for optimal behavior. In another embodiment, to achieve more accurate decision making and behavior, a biasing map may be used in combination with the value function map. The biasing map can choose the best valued cell given matching the biasing direction. The algorithm can narrow down the potential cells to choose from by utilizing the biasing map.
The greedy decision making algorithm can then start exploration mode to find new energy sources. In one embodiment, the algorithm can include the biasing map in combination with the exploration map and bias the map toward the observation point.
In another embodiment, the algorithm can switch between exploration mode and observation mode with a greedy or a stochastic function. An a value can be defined. The alpha value can represent the balance between exploration and going to observation point. The alpha value changes between 0 to 1. In one embodiment, the alpha value can be defined such that the closer it is to zero, it favors exploration mode; and the closer it is to 1, it favors observation mode. As time goes on and more exploration is conducted, the alpha-value increases toward one. This enforces that the observation point is met. Once the ASV arrives at the observation point, the alpha value goes close to 0 (such as 0.0001). This allows the ASV to go back to exploration. Over time, the ASV comes back to exploration mode as the speed of the alpha value is decaying. The increasing of the alpha value depends on a hyper parameter. The hyper parameter can be chosen by the user. It can range between 1% and 99%. The preferable range is approximately 5-15%. For instance, a 10% hyper parameter updates the map each time an observation point is visited. The reward for that observation point is lowered close to 0 so that it favors another observation point over the current observation point. This action may be repeated till all the observation points have been observed. In another instance, once the next observation point is achieved, the value of reward can be restored for the earlier observation point.
The greedy decision-making algorithm is beneficial as many observation points may be defined. It can be generalized with different observation points having different importance levels. It can include priority and wind map information to generate the value function easily and make the map smarter. Furthermore, can be adopted to be run on the ASV and provide live updates.
The following step function may be used:
The Smart Decision Making System is a combination of the earlier methods. In this algorithm, the actions of the system are biased by set of rules defined before the flight. A bias algorithm can be used to choose the best action that maximize the chances of thermal and maximize the observation behavior. The actions of the system may be biased by a set of rules defined before the flight.
The algorithm can evaluate the readings of the sensors 114 and evaluate its value function map. In one embodiment, the algorithm can be configured to trigger a new global path planner sequence if it believes the original maps are not accurate enough.
Smart AI decision making system is an on-line decision making. It can be placed on board or off board. The algorithm uses the input signals to decide on next way points, bank angle and the speed of ASV. The AI system 132 first checks 133 the readings from inputs, if they are any uncertainty or the readings are different from its value function, it will recalculate 134 and update its value function of environment.
If the readings are in the acceptable range of the value function of the system, the system will generate an observation map (such as a value function map), uncertainty map, energy, wind, and glide map. The AI system 132 then uses the alpha factor that is defined before the flight to combine these maps.
The observation map can also be modified by a time factor. Time Factor is a value between 0 to 1. It modifies the rewards of observation points before updating value function map. If an observation point is observed the reward of it goes down.
Since we combine the maps, the generated map 135 is biased towards energy sources, observation points, wind directions, and heading of the ASV.
The Smart AI decision making system 132 then calculates the trajectory and direction for next point to travel to, generates a waypoint 116, bank angle, and speed.
Reinforcement learning (RL) agent Decision Making System
The RL system 136 is similar to the Smart Decision-Making System.
The RL system 136 can be trained or engineered to make the decision. One method of the training is to let the RL agent to be trained in the simulation environment. The evaluation can be by human feedback or compare the results with other systems results.
A reward function can be defined to also train the RL agent that evaluate how much energy was used, whether observation points were visited, and time spent on them. The RL system can use a deep Neural Network as well. The RL system makes the decision based on the input signals and the processed data, the next few points.
For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the examples described herein. However, it will be understood by those of ordinary skill in the art that the examples described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the examples described herein. Also, the description is not to be considered as limiting the scope of the examples described herein.
It will be appreciated that the examples and corresponding diagrams used herein are for illustrative purposes only. Different configurations and terminology can be used without departing from the principles expressed herein. For instance, components and modules can be added, deleted, modified, or arranged with differing connections without departing from these principles.
It will also be appreciated that any module or component exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the system, any component of or related thereto, etc., or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions that may be stored or otherwise held by such computer readable media.
The steps or operations in the flow charts and diagrams described herein are just for example. There may be many variations to these steps or operations without departing from the principles discussed above. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified.
Although the above principles have been described with reference to certain specific examples, various modifications thereof will be apparent to those skilled in the art as outlined in the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2021/051871 | 12/22/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63130308 | Dec 2020 | US |