Short-Term Forecasting Methods

This year, the Spring Forecasting Experiment is focusing on the Day 1 time period more than ever before, eschewing the long lead-time forecasts that we have made in previous years in favor of honing in on timing information and allowing participants to delve into the data. Since more data than ever before is available within the drawing tool where participants draw their forecasts, we’re excited to see how participants probe the new data within the various ensemble subsets.

One short-term experimental forecast product being generated on the Innovation Desk this year are Potential Severe Timing (PST) areas, which indicate which 4-hr period severe weather will occur in over the general area of 15% probability of severe. By identifying the timing of the severe event and displaying all of the timing contours on one graphic, the end product is hoped to be valuable for emergency managers and broadcasters for their advance planning. Small groups of participants generate these forecasts around subsets of the CLUE and HREFv2 ensembles, meaning that on any given day we’ll ideally have 5 separate sets of PSTs. After the participants separate into their small groups and issue their forecasts, we ask them to come back together and brief one another on what their particular ensemble subset was doing. This way, each group of participants can delve into the data from their subset more deeply than if the activity were to take place as one large group. This briefing period also allows the participants to be exposed to different lines of reasoning in issuing their forecasts, and has thus far sparked several good discussions.

Here are the PSTs from 3 May 2017, or Thursday of last week:The different ensemble subset groups compose the top row and the left and middle section of the bottom row, while the bottom right hand panel shows the forecast from the expert forecaster facilitator on the Innovation Desk. Several different strategies are evident within the panels, including some groups that chose not to indicate timing areas for all of the 15% area of our full-period outlook (shown below).

 The reasoning from the groups for their different areas gave insight into the model performance as well as the different forecasting strategies employed by the different groups of people. The group using the HREFv2 decided not to use the NMMB member when generating their forecasts, because the depiction of morning convection was so poor. The HRRRE group had very large areas, which they attribute to the large spread within the HRRRE. The NCAR group decided to discount the guidance in the north of the domain, because of erroneous convection in the northern domain. Instead, they felt more confident in the southern areas where the ensemble was producing supercells. Their group thought that the thermodynamics of the northern area was less conducive to supercellular convection. The group using the mixed physics ensemble from CAPS placed their first area based on where they thought convective initiation would occur, indicating that they thought convection would quickly become severe. Their southern PST was very late to cover any severe threat overnight, but they considered that it might be more of a flood threat (which we do not forecast for in the Spring Forecasting Experiment). The stochastic physics group (another ensemble run by CAPS), on the other hand, had an ensemble which showed almost no signal in the southern area of interest. It also showed a later signal than the other ensembles, contributing to the spread in the time of the first PST.

All of these details came out during the discussion of the PSTs, after participants dove into the data from their subensemble. How did the PSTs do? Here’s a snapshot of the PSTs with reports from 18-22 UTC overlaid:Ideally, all of the reports would fall into the 18-22 UTC contours, which mostly occurred for the expert forecaster and did occur for the HRRRE and Mixed Physics group, although both groups had large areas of false alarm. Here’s a similar image, but showing reports from 22-02 UTC:At this point in time, all groups missed the activity in Kansas, although some groups captured most of the reports within a 22-02 UTC window.

The day after the forecasts, participants are able to go through and give ratings based on the reports that have come in, and choose the group’s forecast that they thought performed the best. Who performed the best for this case? 3 votes for HREFv2, 2 votes each for the HRRRE and the CAPS Stochastic Physics ensemble, and one vote each for the CAPS Mixed Physics and the NCAR ensemble group. Clearly, the complexity of this case provided plenty of nuances to evaluate, and I would bet that more complex cases such as this are on the way….after all, we’ve only just begun Week 2 of the 2018 SFE!

Sneak Peak Part 3: Modeled vs Observed reports

I went ahead and used some educated guesses to develop model proxies for severe storms in the model. But how do those modeled reports compare to observed reports? This question, at least the way it is addressed here, yields an interesting result. Lets go to the figures:

Click for larger

The 2 images show the barchart of all the dates on the left, with the Modeled reports (top), observed reports close to modeled storms (middle) and the natural log of the pixels of each storm (or area; bottom) on the right. The 1st image has the modeled storm reports selected and it should be pretty obvious I have chosen unwisely (either the variable or the value) for my hail proxy (the reports with a 2 in the string). Interestingly, the area is skewed to the right or very large objects tend to be associated with model storms.

Also note that modeled severe storms are largest in the ensemble for 24 May with 27 Apr coming in 6th.  24 May appears first in percent of storms on that date with the 27 Apr outbreak coming in 15th place (i.e. having a lot of storms that are not severe).

Snapshot 2011-12-20 21-26-07
Click for larger

Changing our perspective and highlighting the observed reports that are close to modeled storms, the storm area distribution switches to the left or smallest storm area.

The modeled storms to verify has 25 May followed by 27 Apr coming in with the most observed reports close by. 24 May lags behind in 5th place. In a relative sense, 27 Apr and 25 May switch places, with 24 May coming in 9th place.

These unique perspectives highlight two subtle but interesting points:
1. Modeled severe storms are more typically larger (i.e. well resolved),
2. Observed reports are more typically associated with smaller storms.

I believe there are a few factors at play here including the volume and spacing of reports on any particular day, and of course how well the model performs. 25 May and 27 Apr had lots of reports so they stand out. Plus all the issues associated with reports in general (timing and location uncertainty). But I think one thing also at work here is that these models have difficulty maintaining storms in the warm sector and tend to produce small, short-lived storms. This is relatively bad news for skill; but perhaps a decent clue for forecasters. I say clue because we really need a larger sample across a lot of different convective modes to make any firm conclusions.

I should address the hail issue noted above. I arbitrarily selected an integrated hail mixing ratio of 30 as the proxy for severe. I chose this value after checking out the 3 severe variable (hourly max UH > 100 m s-2  for tornadoes, hourly max wind > 25.7 m s-1, hourly max hail > 30) distributions. After highlighting UH at various thresholds it became pretty clear that hail and UH were correlated. So I think we need to look for a better variable so we can relate hail-fall to modeled variables.

Sneak Peak 2: Outbreak comparison

I ran my code over the entire 2011 HWT data set to compare the two outbreaks from 27 April and 24 May amidst all the other days. These outbreaks were not that similar … or were they?


In the first example, I am comparing the model storms that verified via storm reports with 40% for 27 April and only 17% for 24 May but 37% for 25 May. 25 May also had a lot of storm reports including a large number of tornado reports. Note the distribution of UHobj (upper left) is skewed toward lower values. The natural log of the pixel count per object (middle right) is also skewed toward lower values.
[If I further dice up the data set, requiring UHobj exceed 60, then 27 April has ~12%, 24 May has 7.8%, 25 May has 4% of the respective storms on those days (not shown). ]


In the second example, if I only select the UHobj greater than 60, the storm percentages for 27 Apr are 25%, 24 May are 35%, and 25 May are 8%. The natural log of the pixel count per object (middle right) is also skewed toward higher values. Hail and Wind parameters (middle left and bottom left, respectively) shift to higher values as well.

Very interesting interplay exists here since 24 May did not subjectively verify well (too late, not very many supercells). 27 Apr verified well, but had a different convective mode of sorts (linear with embedded supercells). 25 May I honestly cannot recall other than the large number of reports that day.

Comments welcome.

Sneak Peak from the past

So after the Weather Ready Nation: A Vital Conversation Workshop, I finally have some code and visualization software working. So here is a sneak peak, using the software Mondrian and an object identification algorithm that I wrote in Fortran, applied via NCL. Storm objects were defined using a double threshold, double area technique. Basically you set the minimum Composite Reflectivity threshold, and use the second threshold to ensure you have a true storm. The area thresholds apply to the reflectivity thresholds so that you restrict storm sizes (essentially as a filter to reduce noise from very small storms).

So we have a few ensemble members from 27 April generated by CAPS which I was intent on mining. The volume of data is large but the number of variables was restricted to some environmental and storm centric perspectives. I added in the storm report data from SPC (soon I will have the observed storms).


In the upper left is a barchart of my cryptic recording of observed storm reports, below that is the histogram of hourly maximum surface wind speed, and below that is the integrated hail mixing ratio parameter. The two scatter plots in the middle show the (top) CAPE-0-6km Shear product versus the hourly maximum updraft helicity obtained from a similar object algorithm that intersects with the storm, and the (bottom) 0-1km Storm Relative Helicity vs the LCL height. The plots to the right show the (top) histogram of model forecast hour, (bottom) sorted ensemble member spinogram*, and (bottom inset) the log of the pixel count of the storms.

The red highlighted storms used a CASH value greater than 30 000 and UHobj greater than 50. So we can see interactively on all the plots, where these storms appear in each distribution. The highlighted storms represent 24.04 percent of the sample of 2271 storms identified from the 17 ensemble members over the 23 hour period from 1400 UTC to 1200 UTC.

Although the contributions from each member are nearly equivalent (not shown; cannot be gleaned from the spinogram easily), some members contribute more of their storms to this parameter space (sorted from highest to lowest in the member spinogram). The peak time for storms in this  environment was at 2100 UTC with the 3 highest hours being from 2000-2200 UTC. Only about half of the modeled storms had observed storm reports within 45km**. This storm environment contained the majority of high hail values though the hail distribution has hints of being bimodal. The majority of these storms had very low LCL heights (below 500 m) though most were below 1500m.

I anticipate using these tools and software for the upcoming HWT. We will be able to do next day verification using storm reports (assuming storm reports are updated via the WFO’s timely) and I hope to also do a strict comparison to observed storms. I still have work to do in order to approach distributions oriented verification.

*The spinogram in this case represents a bar chart where the length of the bar is converted to 100 percent and the width of the bar is the sample size. The red highlighting now represents the within category percentage.

**I also had to do a +/- 1 hour time period. An initial attempt to verify the tornado reports in comparison to the tornado tracks yielded a bit of spatial error. This will need to be quantified.

More Data Visualization

As jimmyc touched on in his last post, one of the struggles facing the Hazardous Weather Testbed is how to visualize the incredibly large datasets that are being generated. With well over 60 model runs available to HWT Experimental Forecast Program participants, the ability to synthesize large volumes of data very quickly is a must. Historically we have utilized a meteorological visualization package known as NAWIPS, which is the same software that the Storm Prediction Center uses for their operations. Unfortunately, NAWIPS was not designed with the idea it would be handling the large datasets that are currently being generated.

To help mitigate this, we utilized the Internet as much as possible. One webpage that I put together is a highly dynamical, CI forecast and observations webpage. This webpage allowed users to create 3, 4, 6, or 9 panel plots, with CI probabilities of any of 28 ensemble members, NSSL-WRF, or observations. Furthermore, users had the ability to overlay the raw CI points from any of the ensemble members, NSSL-WRF, or observations to see how the points contributed to the underlying probabilities. We even enabled it so that users could overlay the human forecasts to see how it compared to any of the numerical guidance or observations. This webpage turned out to be a huge hit with visitors, not only because it allowed for quick visualization of a large amount of data, but because it also allowed visitors to interrogate the ensemble from anywhere — not just in the HWT.

One of the things we could do with this website is evaluate the performance of individual members of the ensemble. We could also evaluate how varying the PBL schemes affected the probabilities of CI. Again, the website is a great way to sift through a large amount of data in a relatively short amount of time.


We all did some web displays for various components of the CI desk. I built a few web displays based on object identification of precipitation areas. I counted up the objects per hour for all ensemble or all physics members (separate web pages) in order to 1) rapidly visualize the entire membership and 2) to add a non-map based perspective of when interesting things are happening. It also allows the full perspective of the variability in time, and variability of position and size of the objects.

The goal was to examine the models in multiple ways simultaneously and still investigate the individual members. This, in theory, should be more satisfying for forecasters as they get more comfortable with ensemble probabilities.  It could alleviate data overload by giving a focused look at select variables within the ensemble. Variables that already have meaning and implied depth. Information that is easy to extract and reference.

The basic idea as implemented was to show the object count chart and upon mousing over a grid cell you can call up a map of the area with the precipitation field. At the upper and right most axes, you call up an animation of either all the models at a specific time OR one model at all times. The same concept was applied to updraft helicity.

I applied the same idea to the convection initiation points only this time there were no objects, just the raw number of points. I had not had time to visualize this prior to the experiment, so we used this as a way to compare two of the definitions in test mode.

The ideas were great, but in the end there were a few issues. The graphics were good in some instances because we started with no precipitation or updraft helicity or CI points. But if the region already had storms then interpretation was difficult, at least in terms of the object counts. This was a big issue with the CI points, especially as the counts increased well above 400, for a 400 by 400 km sub domain.

Another display I worked hard on was the so-called pdf generator. The idea was to use the ensemble to reproduce what we were doing, namely putting our CI point on the map where we thought the first storm would be. Great in principle, but automating this was problematic because we could choose our time window o fit the situation of the day. The other complication was that sometimes we had to make our domain small or big, depending on how much pre-existing convection was around. This happened quite frequently so the graphic was less applicable, but still very appealing. It will take some refinement but I think we can make this a part of the verification of our human forecasts.

I found this type of web display to be very useful and very quick. It also allows us to change our perspective from just data mining to information mining and consequently to think more about visualization of the forecast data. There is much work to be done in this regard and I hope some of these ideas can be further built upon for visualization and Information Mining so they can be more relevant to forecasters.