Macro- and micro-economic trends, from infrastructure availability to energy consumption, are often affected by weather. Similarly, human behavior is often (consciously or subconsciously) affected by weather. Accordingly, businesses and other organizations seek accurate forecasts to predict everything from overall economic trends to demand for specific products.
Conventional economic forecasting systems analyze past economic behavior and construct economic forecasting models to predict future economic behavior. Weather databases include historical weather data that may be correlated with past events. Accordingly, some conventional economic forecasting systems may incorporate past weather data and model economic behavior as a function of weather conditions so as to predict future economic behavior in view of forecasted weather and climate conditions.
Conventional economic forecasting systems model an economic metric of interest by analyzing all of the available metrics that may be correlated with that economic metric of interest, determining the metrics where the correlation to the economic metric of interest, and generating a model that forecasts the economic metric of interest as a function of all of the metrics with a statistically significant correlation to the economic metric of interest.
However, conventional systems are poorly constructed to model past events based on past weather metrics because of the multicollinearity of past weather metrics. Multicollinearity is a phenomenon that occurs when two or more metrics are moderately or highly correlated with one another. In the fields of meteorology and climate science, the number of weather metrics has increased substantially. The weather database currently available from AccuWeather Enterprise Solutions of State College, Pa., for example, includes more than 300 weather metrics, including first-order derivatives, second order derivatives, etc. Some of those additional weather metrics are more predictive of economic trends than simpler weather metrics that may be considered by simpler economic forecasting systems. However, with more than 300 available weather metrics, multicollinearity occurs frequently as some of those weather metrics are highly related measurements of the same phenomena. For example, the daily high temperature, low temperature, and average temperature are all different metrics. However, they are all highly correlated to each other as they are all measuring heat present in the atmosphere at a specific location on a specific day.
Because of the high multicollinearity of historical weather metrics, conventional economic forecasting systems generate overfitted models or underfitted models. Overfitting is the production of an analysis that corresponds too closely or exactly to a particular set of data and may therefore fail to reliably predict future observations. In essence, an overfitted model conforms to the residual variation (i.e., the noise) in the past data, which is not expected to occur in future data, leading to an inaccurate forecast. Underfitting occurs when a statistical model cannot adequately capture the underlying structure of the data. A simple example of underfitting is fitting a linear model to non-linear data, which would tend to have poor predictive performance. However, an underfitted model can be any model where some parameters or terms that would appear in a correctly specified model are missing.
Therefore, there is a need for an economic forecasting system that forecasts future economic trends based on forecasted weather metrics without developing an overfitted model or an underfitted model due to the high multicollinearity of historical weather metrics.
In order to overcome those and other technical problems with conventional forecasting systems, an economic forecasting system is provided that analyzes weather metrics that are divided into groups (based on the multicollinearity of the weather metrics in each group), identifies the most statistically significant weather metrics from each group, generates a statistical model using the one or more most statistically significant weather metrics from each group, receives forecasted weather metrics, and forecasts an economic performance metric of interest based on the statistical model and the forecasted weather metrics.
In contrast to the underfitted or overfitted models generated using conventional methods, analyzing weather metrics that are divided into groups based on the multicollinearity of those weather metrics causes the disclosed system to efficiently identify the weather metrics that are most predictive of the future economic trends, even using a large number of weather metrics that are computationally expensive to test.
Aspects of exemplary embodiments may be better understood with reference to the accompanying drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of exemplary embodiments.
Reference to the drawings illustrating various views of exemplary embodiments of the present invention is now made. In the drawings and the description of the drawings herein, certain terminology is used for convenience only and is not to be taken as limiting the embodiments of the present invention. Furthermore, in the drawings and the description below, like numerals indicate like elements throughout.
As shown in
The historical economic performance database 120 stores geo-located and time-indexed historical economic performance metrics 122. Each of the historical economic performance metrics 122 describe one or more events that took place at in a specific location 124 at a specific time 126. For each geo-located and time-indexed historical economic performance metric 122, the historical economic performance database 120 stores the magnitude of each metric 122, the location 124, and the time 126. The location 124 may be expressed as latitude and longitude, municipality (e.g., city, county, state, etc.), region, etc. The time 126 may be the date, the specific time of day on that date, etc.
The historical economic performance metrics 122 may include retail sales metrics (e.g., sales as dollars, point-of-sale quantities, counts of trends, sales of item by specific SKU numbers, etc.), infrastructure metrics (e.g., location availability, power outages, etc.), commodities metrics (e.g., energy usage, demand for other commodities), human resources metrics (e.g., employee availability), etc.
The historical economic performance metrics 122 may be received from third party sources, including governmental sources, such as the U.S. National Oceanic and Atmospheric Administration (NOAA), the U.S. National Aeronautics and Space Administration (NASA), the U.S. Health Resources & Services Administration (HRSA), the U.S. Bureau of Economic Analysis (BEA), and the U.S. Bureau of Labor Statistics (BLS), as well as private sources of economic data, such as Drought Monitor, the National Snow and Ice Data Center (NSIDC), ESRI Marketplace data, the Cornell Institute for Social and Economic Research (CISER), TWITTER and FACEBOOK data, financial market data, and power outage data. (FACEBOOK is a trademark of Facebook, Inc. TWITTER is a trademark of Twitter, Inc.) Most often, however, the economic forecasting system 100 is used to forecast economic trends for a specific client based on historical economic performance metrics 122 received from that client.
The historical weather database 140 stores geo-located and time-indexed historical weather metrics 142. Again, each of the geo-located and time-indexed historical weather metrics 142 describe a weather or environmental condition in a specific location 144 at a specific time 146. For each geo-located and time-indexed historical weather metric 142, the historical weather database 140 stores the magnitude of each metric 142, the location 144, and the time 146. The location 144 may be expressed as latitude and longitude, municipality (e.g., city, county, state, etc.), region, etc. The time 146 may be the date, the specific time of day on that date, etc.
The historical weather metrics 142 may include temperature metrics, including highest temperature, lowest temperature, average daily temperature (all hours), highest temperature departure from normal, lowest temperature departure from normal, average daily temperature departure from normal, average daily temperature (highest/lowest), etc.; Dew point, relative humidity, soil temperature and moisture metrics, including maximum dew point temperature, minimum dew point temperature, average dew point temperature, maximum relative humidity, minimum relative humidity, average relative humidity, maximum wet bulb temperature, minimum wet bulb temperature, average wet bulb temperature, soil moisture, etc.; Atmospheric pressure metrics, including highest pressure, lowest pressure, average pressure, etc.; Cooling, heating, effective, growing, and freezing degree days metrics, including cooling degree days, heating degree days, effective degree days, growing degree days, freezing degree days, etc.; Wind metrics, including highest sustained wind speed, lowest sustained wind speed, average sustained wind speed, highest wind gust, etc.; Solar irradiance metrics, including maximum solar radiance, minimum solar radiance, average solar radiance, total solar radiance, etc.; Sunshine metrics, including total minutes of sunshine, minutes of sunshine possible, percent of sunshine possible, etc.; Precipitation metrics, including observed daily water equivalent, percent of normal daily water equivalent, etc.; Snow, freeze, ice, and sleet metrics, including snowfall, snow at 0.50 inches, snow on ground, snow within 35 miles, etc.; Spring, tropical storms, hurricane, and visibility metrics, including average visibility, visibility at 0.50 miles, visibility at 2.00 miles, etc. The historical weather metrics 142 may include first-order derivatives, second order derivatives, etc. The historical weather metrics 142 may include proprietary weather metrics, such as the average daily REALFEEL temperature, the maximum daily REALFEEL temperature, the minimum daily REALFEEL temperature, etc. (REALFEEL is a registered service mark of AccuWeather, Inc.)
The historical weather metrics 142 may be received, for example, from AccuWeather, Inc., AccuWeather Enterprise Solutions, Inc., the National Weather Service (NWS), the National Hurricane Center (NHC), Environment Canada, other governmental agencies (such as the U.K. Meteorologic Service, the Japan Meteorological Agency, etc.), private companies (such as Vaisalia's U. S. National Lightning Detection Network, Weather Decision Technologies, Inc.), individuals (such as members of the Spotter Network), etc. The historical weather metrics 142 may also include information regarding environmental conditions received, for example, from the U.S. Environmental Protection Agency (EPA) and/or information regarding natural hazards (such as earthquakes) received, for example, from the U.S. Geological Survey (USGS).
The weather forecast database 160 stores forecasted weather metrics 162. The forecasted weather metrics 162 include forecasted weather and environmental conditions for specific locations 164 and specific times 166. The locations 164 may be expressed as latitude and longitude, municipality (e.g., city, county, state, etc.), region, etc. The times 166 may be the date, the specific time of day on that date, etc. The forecasted weather metrics 162 may be short term forecasted weather metrics, long term forecasted weather metrics, long term climatological metrics, etc.
The forecasted weather metrics 162 include the same weather metrics as the historical weather metrics 142 and may be received from the same sources. The economic forecasting system 100 may also include a weather forecasting engine (not shown) that generates some or all of the forecasted weather metrics 162, for example using one or more mathematical models of the atmosphere and oceans to predict future weather conditions based on current weather conditions.
The economic forecast engine 180 builds a statistical model for each economic performance metric 122 of interest based on correlations between the geo-located and time-indexed historical economic performance metric 122 of interest and the geo-located and time-indexed historical weather metrics 142. As described in detail below, the economic forecast engine 180 identifies historical weather metrics 142 that correlate with an historical economic performance metric 122 of interest, such that the model can be generated and used to forecast the economic performance metric 122 of interest based on the forecasted weather metrics 162.
Notably, the economic forecast engine 180 does not analyze all of the weather metrics 142 together or build a statistical model using all of the historical weather metrics 142 found to be statistically significant because, as described in the background of this disclosure, doing so would result in an overfitted or underfitted model, in part because of the multicollinearity of the historical weather metrics 142.
Instead, the economic forecast engine 180 separately analyzes groups of historical weather metrics 142 and identifies one or more of the most statistically significant historical weather metrics 142 in each group. Each group includes historical weather metrics 142 that have been grouped together based on their multicollinearity.
In one exemplary embodiment, the economic forecast engine 180 uses the following ten groups of historical weather metrics 142:
The historical weather metrics 142 are segregated into groups (for example, as shown above) based on their multicollinearity. Specifically, the weather metrics 142 are segregated into groups such that the historical weather metrics 142 with the highest absolute Pearson correlation coefficient are in the same group. Table 1 shows rules of thumb when using Pearson correlation coefficients to determine multicollinearity.
Table 2 shows a simplified example of separating historical weather metrics 142 into groups based on Pearson correlation coefficients, using only three temperature metrics (highest temperature, lowest temperature, and average temperature) and three wind metrics (highest wind speed, lowest wind speed, and average wind speed).
As shown in Table 2, the highest temperature, the lowest temperature, and the average temperature all have a strong (in this instance, positive) correlation with respect to each other and so are therefore grouped together (as temperature metrics). Similarly, the highest wind speed, the lowest wind speed, and the average wind speed all have moderate-to-strong (in this instance, positive) correlations with each other and so are therefore grouped together (as wind metrics). Conversely, none of the temperature metrics have even a week correlation (either positive or negative) with any of the wind metrics. Accordingly, the example temperature metrics and the example wind metrics are separated into different groups.
For each group of historical weather metrics 142, a correlation analysis is performed in step 210. The correlation analysis determines the Pearson correlation coefficient and statistical significance (e.g., probability value or “p-value”) of each historical weather metric 142 with respect to the economic performance metric 122 of interest.
Up to a predetermined number of the most statistically significant historical weather metrics 142 are selected from each group of historical weather metrics 142 in step 220. The processes 210 and 220 for performing a correlation analysis and selecting the most statistically significant historical weather metrics 142 from each group is described in detail with reference to
A statistical model is generated using the selected historical weather metrics 142 in step 230. The forecasting model may be generated using regression analysis (e.g., linear, logistic, best subsets, stepwise, etc.), decision trees (e.g., C5, CART, CHAID, etc.), neural networks (Multilayer Perceptron, Radial Basis Function, etc.) or other artificial intelligence, etc.
Forecasted weather metrics 162 are received in step 240.
A forecast for the economic performance metric 122 of interest is generated in step 250 based on the statistical model generated in step 230 and the forecasted weather metrics 162 received in step 240.
The forecasted generated in step 250 is output in step 260. The forecast may be output to a user via a graphical user interface. Additionally or alternatively, the forecast may be output to a communication network for transmittal to a client computing device (for example, the source of the economic performance metric 122 of interest).
As shown in
For each group of historical weather metrics 142, a correlation analysis is performed to identify the Pearson correlation coefficient and statistical significance of each historical weather metric 142. Specifically, for Group A, a correlation analysis is performed in step 210 to identify the Pearson correlation coefficient and statistical significance of each of the historical weather metrics A1, A2, etc. in Group A with respect to the economic performance metric 122 of interest. Similarly, for Group B, a correlation analysis is performed in step 211 to identify the Pearson correlation coefficient and statistical significance of each of the historical weather metrics B1, B2, etc. in Group B with respect to the economic performance metric 122 of interest. A similar correlation analysis is performed in steps 212 through 219 for each of the historical weather metrics 142 in Groups C through J.
Table 3 shows an example identifying the Pearson correlation coefficients and statistical significance of seven temperature metrics (Group A in the example above).
For each group of historical weather metrics 142, up to n of the most significant historical weather metrics 142 are selected. Specifically, for Group A in step 220, the nA historical weather metrics 142 with the highest absolute Pearson correlation coefficient are selected, provided there are nA historical weather metrics 142 with a statistical significance within a predetermined threshold. (The predetermined threshold may be, for example, p≤0.05 or more preferably p≤0.01 or most preferably p≤0.001). Similarly, for Group B in step 221, the nB historical weather metrics 142 with the highest absolute Pearson correlation coefficient are selected (provided there are nB historical weather metrics 142 with a statistical significance within the predetermined threshold). A similar selection process is performed in steps 222 through 229 to select up to nC metrics from Group C, select up to nD metrics from Group D, etc., and to select up to nJ metrics from Group J.
Referring back to the example in Table 3, if the number nA of historical weather metrics 142 selected from Group A is two, then the economic forecast engine 180 would select highest temperature departure from normal and average daily temperature departure from normal in order to build the statistical model.
The number of historical weather metrics n selected from each group may vary from group to group. Using the specific ten groups of the historical weather metrics 142 described above, in the most preferred embodiment, the economic forecast engine 180 selects the two most significant temperature metrics (Group 1), the two most significant dew point, relative humidity, soil temperature and moisture metrics (Group 2), the one most statistically significant atmospheric pressure metric (Group 3), the two most statistically significant cooling, heating, effective, growing, and freezing degree days metrics (Group 4), the two most statistically significant wind metrics (Group 5), the one most statistically significant solar irradiance metric (Group 6), the two most statistically significant sunshine metrics (Group 7), the two most statistically significant precipitation metrics (Group 8), the three most statistically significant snow, freeze, ice, and sleet metrics (Group 9), and the three most statistically significant tropical storms, hurricane, and visibility metrics (Group 10).
As described above, the economic forecast engine 180 uses the selected historical weather metrics 142 from all of the groups (in the most preferred embodiment, the 20 most statically significant historical weather metrics 142 with respect to the economic performance metric 122 of interest) and generates a statistical model to forecast the economic performance metric 122 of interest.
As shown in
Each of the client computers 422, 424, etc. may be any suitable hardware computing device configured to send and/or receive data via the networks 432, 436, etc. Each of the client computers 422, 424, etc., may be, for example, a network-connected computing device such as a server, a personal computer, a notebook computer, a smartphone, a personal digital assistant (PDA), a tablet, network-connected vehicle, etc. Each of the client computers includes an internal storage device and a hardware processor, such as a central processing unit (CPU). Some or all of the client computers 422, 424, etc., may include output devices, such as a display, and input devices, such as a keyboard, mouse, touchpad, etc. Each of the one or more servers 442, 444, etc., may be any suitable hardware computing device configured to send and/or receive data via the networks 434, 436, etc. Each of the one or more servers 442, 444, etc., may be for example, an application server and a web server which hosts websites accessible by the client-side computing devices 420. Each of the one or more servers 442, 444, etc., include an internal non-transitory storage device and at least one hardware computer processor. Each non-transitory computer-readable storage media 426 and 446 may include hard disks, solid-state memory, etc. The one or more networks 432, 434, 436, etc., may include any combination of the internet, cellular networks, wide area networks (WAN), local area networks (LAN), etc. Communication via the network(s) 432, 434, 436, etc., may be realized by wired and/or wireless connections.
Referring back to
In the architecture 400 illustrated in
The architecture 500 illustrated in
The architecture 600 illustrated in
Since the currently available weather database has over 300 historical weather metrics 142, the dimensional reduction process described above allows the economic forecasting system 100 to uncover significant metrics 142 that may potentially be lost when tested with all metrics together (as may be done with convention economic forecasting systems), improving the accuracy of the statistical model used to forecast the economic performance metric 122 of interest. As an example, when wind speed, temperature, and humidity are tested together, temperature and humidity may be statistically significant due to their strong interaction, which overshadows the effect of wind speed on the economic performance metric 122 of interest. However, when the economic forecasting system 100 tests wind speed in conjunction with other wind speed metrics as described above, the economic forecasting system 100 has found that the highest sustained wind speed and wind gust speed are statistically significant with certain economic performance metrics 122.
The economic forecasting system 100 generates highly accurate forecasts of economic trends by decreasing the number of historical weather metrics 142 into a more manageable set, without sacrificing the accuracy of future models, and performing analytical processes with the most statistically significant historical weather metrics 142 from each group. The disclosed economic forecasting system 100 also provides repeatable results for the user for performing a variety of analytical projects.
In general, the large amount of historical weather metrics 142 available for testing are computationally expensive to test. By testing the historical weather metrics 142 in separate groups (and later combining the most statistically significant historical weather metrics 142 from each of the groups to generate a statistical model), the economic forecasting system 100 is able to efficiently determine which of the historical weather metrics 142 from each group have a significant relationship with the economic performance metric 122 of interest.
The economic forecasting system 100 is also able to provide clients with the most accurate insights and forecasts of economic trends so that they can utilize forecasted weather metrics 162 to capture future sales lifting events and minimize sales depressing events. The economic forecasting system 100 allows for more effective planning and increased sales across all product lines and geographical regions.
The economic forecasting system 100 overcomes a technical problem with conventional economic forecasting systems that may analyze historical weather metrics 142 together and therefore generate underfitted and/or overfitted statistical models, in part due to the high multicollinearity of historical weather metrics 142. By analyzing historical weather metrics 142 together, a conventional economic forecasting system may generate an underfitted statistical model that forecasts an economic performance metric 122 of interest as a function of only the following five historical weather metrics 142:
By contrast, the economic forecasting system 100, using the dimension reduction process described above, is able to identify historical weather metrics 142 that have a more subtle relationship with the economic performance metric 122 of interest, which are lost when historical weather metrics 142 are analyzed together. Accordingly, the economic forecasting system 100 using the dimension reduction process described above generates a statistical model that forecasts an economic performance metric 122 of interest as a function of the following 13 historical weather metrics 142:
While preferred embodiments have been set forth above, those skilled in the art who have reviewed the present disclosure will readily appreciate that other embodiments can be realized within the scope of the invention. For example, disclosures of specific numbers of hardware components, software modules and the like are illustrative rather than limiting. Therefore, the present invention should be construed as limited only by the appended claims.
| Number | Date | Country | |
|---|---|---|---|
| 62460596 | Feb 2017 | US |