SFE 2018 Wrap-Up Blog

Well, this wrap-up blog is a bit later than usual, probably due to the amount of communicating we’ve been doing with regards to the Spring Forecasting Experiment at other venues! Multiple talks on different aspects of the SFE took place during the American Meteorological Society’s 29th Conference on Weather and Forecasting/25th Conference on Numerical Weather Prediction Conference in Denver, CO, from 4 June–8 June. Considering that the experiment concluded the Friday before the meeting began, data doesn’t get much fresher than that! The talks included an overview of the experimentdiscussion of the new web capabilities in the 2018 SFEHREF configurations, subjective and objective evaluation of FV3 configurations contributed by different agencies, new experimental verification methods, scorecard development, and examination of the CAPS FV3 Ensemble performance. The official summary report is forthcoming, but those talks provide a great sampling of preliminary results from the experiment.

More analysis is underway, but objective results available on the SFE homepage summarize a small subsample of CAMs according to a preliminary set of performance metrics, including reflectivity at selected thresholds and surrogate severe fields generated from the updraft helicity (UH) fields. These surrogate severe fields will be updated, since the initial tests used a threshold of UH rather than a percentile of UH. As the climatology of UH can vary greatly between models (one model can consistently produce higher UH values) and is grid-spacing-dependent, utilizing a percentile rather than a specific value is better able to capture differences between guidance that otherwise might be masked by different climatologies.

Reflectivity > 35 dBZ CSI scores aggregated over the five weeks of the 2018 SFE for different lead times.

Overall, the HRRRv3 reflectivity fields performed better than either of the FV3 models examined in this comparison, with the most drastic difference starting during the afternoon convective period. This difference is also reflected in the subjective evaluation of the reflectivity and UH fields, with the HRRRv3 receiving more ratings of 7-9 (out of 10) than the other models.

Subjective participant ratings of several convection-allowing models initialized at 0000 UTC

Looking at the varieties of FV3 produced by CAPS with two different microphysics schemes (where participants compared two of the ensemble members which used each microphysics scheme), differences between the reflectivity and UH location were noted most often when compared to reflectivity magnitude and storm mode.

Participant responses to the question, “How do the following variables compare between the FV3 members with the Thompson microphysics and the FV3 members with the NSSL microphysics?”

And how was the weather during the 2018 SFE? Well, the SPC put out a great graphic (below) showing some of the severe weather highlights during the season, and overall, the number of severe thunderstorm and tornado watches were well below average, as were the number of wind and tornado reports. The first and last week of the experiment had the most exciting weather, including a tornado that passed through north Norman (participants were especially attentive on this day, for some reason!). While the weather may not have been fully cooperative, the dataset created from the different NWP models over the 5 weeks of the experiment will surely generate some fruitful results. After all, determining how the NWP is performing over weakly forced cases is critical, as good guidance could help decision-makers to make the right call on days with high uncertainty.

Summary graphic of May 2018 Severe Convective Weather produced by the SPC

The experimental hourly probabilities of severe weather based on guidance from the NEWS-e are also currently being analyzed, pulling together information about how participants used the NEWS-e guidance and which tools they used most frequently alongside the forecast performance and participant opinions about their forecasts and the guidance. The experimental timing forecasts (Potential Severe Timing areas, or PSTs for short) are also currently being analyzed, but proved much more understandable to participants than prior timing products we have tested in the SFE.

Wrapping up, I and the entire SFE team would like to express our appreciation to everyone who helped coordinate and contributed to the experiment. Although it only takes place for five weeks every year, months of planning and development happen beforehand both at the National Weather Center and with our partner agencies throughout the country. We would also like to thank the participants, who share their knowledge, time, advice, and enthusiasm for severe convective weather forecasting during their time in the experiment. As we delve into the results from SFE 2018, let the anticipation for SFE 2019 begin!

Tags: None

FV3 Configurations

Though we do lots of forecasting in the Spring Forecasting Experiment, we also evaluate cutting-edge numerical weather prediction. This year, we’re looking at many iterations of the Finite Volume Cubed model, or FV3, which is slated to eventually replace the GFS as the next generation global prediction system. This year, we have versions of FV3 provided by the Geophysical Fluid Dynamics Laboratory (GFDL; this is where FV3 was developed), the National Severe Storms Laboratory (NSSL), and the Center for the Analysis and Prediction of Storms (CAPS). GFDL and NSSL are each providing one member, while CAPS is providing an ensemble of eleven members that use different microphysics schemes and different planetary boundary layer (PBL) schemes. These different configurations can help illuminate the behavior of this new model core with multiple sets of physical parameterizations, grid spacings and convolutions, and FV3 versions. For more details, see the Operations Plan.

Last Wednesday, May  23rd, provided one of our most interesting case studies for the different versions of CAPS FV3, with differences in the storm structure, location of precipitation, and thermodynamic environment between the different microphysics and the different PBL schemes. The below panels show the reflectivity fields for CAPS FV3 members with four different PBL schemes, all of which use the Thompson microphysics.

Read more »

Tags: None

Domain Decisions

When severe weather is imminent across the country, forecasters at the SPC must consider all of the possible areas. However, here at the SFE where we are considering multiple experimental NWP ensembles, we have to select our domain of interest. When the severe weather is spread across an area larger than a single domain, we must choose which region we want to focus on. To do this, the facilitators consult not only our experimental numerical weather prediction models, but also the upper-air data collected by radiosondes. Typically, the domains have quite a bit of overlap, but on days like last Thursday (17 May), a decision must be made between two very different areas.

After evaluating the previous day’s forecasts, the first forecasting activity for the current day that participants do is the hand-analysis of upper air maps at six different levels: 250 mb, 500 mb, 700 mb, 850 mb, 925 mb, and the surface. This activity ensures that participants get their hands onto the observed data, and develop a thorough understanding where a convectively favorable environments will occur. A map discussion follows (pictures of which can be found on the NSSL Flickr account) where participants share what they’ve learned through their contours. On Thursday, the conversation was far-reaching, as we had to decide between two domains that had very little overlap:

The two potential areas of interest for 17 May 2018, with the eventual selected domain highlighted in green.

Read more »

Consider the Messaging

One of the advantages to having forecasters from the Storm Prediction Center (SPC) working on both the Innovation Desk and the Severe Hazards Desk is that they have experience with issuing outlooks, and how the public reacts to those outlooks. Thus, we can explore the nuance of forecast issuance better than if we were operating without the SPC forecasters. An example of this dynamic occurred on Tuesday this week, with the well-forecast line of storms that moved through the northeastern United States, bringing many reports of wind damage, injuries, and even fatalities. The experimental forecasts issued by both the Severe Hazards Desk and the Innovation Desks were the equivalent of a categorical Moderate Risk from the SPC, which can be triggered by either a 45% forecast of wind with significant wind gusts or a 60% forecast of wind without the significant wind gusts:

On the Innovation Desk side, there was a lot of debate about which type of moderate risk to issue. A categorical high risk was also considered, by debating whether or not to issue a 60% coverage contour with significant severe. Although we thought that there would be pretty high coverage of severe wind reports, the low-level shear and the strong low-level flow that would typically warrant a high-risk scenario were missing on Tuesday. High risk issuance for wind from the SPC is rare (as are high risks in general) and typically reserved for derecho-type events. The last wind-driven high risk was issued by the SPC on 3 June 2014 across Nebraska. Thus, an aggressive moderate was the message the Innovation Desk tried to pursue, with a 45% area and an area of significant severe (wind gusts in this case, although the Innovation Desk considers all types of severe convective weather in issuing its probabilities):

The Severe Hazards Desk issued a moderate that emphasized the high coverage potential of the event, going with a 60% contour with no significant wind highlighted:

Both of these outlooks indicated that the meteorological conditions were unusually favorable for a wind event, with strong mid-level flow and excellent lapse rates for the northeast. However, maintaining the equivalent of a moderate categorical risk communicated that this system, while dangerous and powerful, would not rise to the level of some of the most infamous derechos of years past. In meteorology, as in many things, communication is key.

Tags: None

Forecasting During Quieter Days

Although May is the heart of severe convective season across most of the United States, some days still have relatively little severe convective weather. This has been the case for most of this week, with yesterday having the most severe convective reports of the week (48) according to the Storm Prediction Center’s (SPC’s) storm report page. However, just because there are fewer storms occurring doesn’t mean that operations halt during the Spring Forecasting Experiment – quite the opposite! While we may have fewer probabilistic contours to draw if we don’t have as high of probability on a given day, the placement and magnitude of these forecast contours are still a significant challenge.

These types of days challenge the models in a different way, showing how they perform on days that are less strongly-forced on the synoptic scale or may not have as much available moisture than the days with more storm coverage. Considering that only about twelve days per year reach the level of a moderate risk according to the SPC, lower end days are far more common and thus require thorough testing as well.

This week has so far exemplified a number of different lower risk days, mainly associated with a slow-moving trough progressing across the contiguous United States. On Monday, a smattering of wind and hail reports affected western South Dakota and eastern Wyoming. Absent of strong upper-level flow, terrain was a large consideration in our forecasts. Tuesday saw a handful of hail reports and a tornado report in eastern South Dakota and across Iowa. Ongoing elevated convection in the morning complicated this forecast, as the main area of convective concern was not starting with a clean environment. Yesterday a morning mesoscale convective vortex (which were originally known as “Neddy Eddies” after retired SPC forecaster Ned Johnston) led to questions about when convection would initiate and how well the models captured the relatively small-scale vortex.

Even when our forecasts are challenging and don’t perform as well as we’d like, we still perform subjective verification the following day. Below are the full-period forecasts for Monday for total severe from the Innovation Desk (left) and for wind from the Severe Hazards Desk (right):

Tuesday for total severe from the Innovation Desk (left) and for hail from the Severe Hazards Desk (right):

and Wednesday for total severe from the Innovation Desk (left) and again for wind from the Severe Hazards Desk (right): 

While these forecasts were far from perfect, we don’t update these full-period forecasts later during the day and issue them by ~10AM CDT. Therefore, these represent our initial impression of the weather for that day after only an hour or two of consideration – and we are getting additional observations and numerical guidance as the day goes on. Particularly on Wednesday, this updated guidance caused us to shift our short time period forecasts (which we do update) as the afternoon wore on. Clearly, more marginal forecasts still present challenges, just different ones than the high-end days. We always have something to consider here during the SFE!

Tags: None

Short-Term Forecasting Methods

This year, the Spring Forecasting Experiment is focusing on the Day 1 time period more than ever before, eschewing the long lead-time forecasts that we have made in previous years in favor of honing in on timing information and allowing participants to delve into the data. Since more data than ever before is available within the drawing tool where participants draw their forecasts, we’re excited to see how participants probe the new data within the various ensemble subsets.

One short-term experimental forecast product being generated on the Innovation Desk this year are Potential Severe Timing (PST) areas, which indicate which 4-hr period severe weather will occur in over the general area of 15% probability of severe. By identifying the timing of the severe event and displaying all of the timing contours on one graphic, the end product is hoped to be valuable for emergency managers and broadcasters for their advance planning. Small groups of participants generate these forecasts around subsets of the CLUE and HREFv2 ensembles, meaning that on any given day we’ll ideally have 5 separate sets of PSTs. After the participants separate into their small groups and issue their forecasts, we ask them to come back together and brief one another on what their particular ensemble subset was doing. This way, each group of participants can delve into the data from their subset more deeply than if the activity were to take place as one large group. This briefing period also allows the participants to be exposed to different lines of reasoning in issuing their forecasts, and has thus far sparked several good discussions.

Here are the PSTs from 3 May 2017, or Thursday of last week:The different ensemble subset groups compose the top row and the left and middle section of the bottom row, while the bottom right hand panel shows the forecast from the expert forecaster facilitator on the Innovation Desk. Several different strategies are evident within the panels, including some groups that chose not to indicate timing areas for all of the 15% area of our full-period outlook (shown below).

 The reasoning from the groups for their different areas gave insight into the model performance as well as the different forecasting strategies employed by the different groups of people. The group using the HREFv2 decided not to use the NMMB member when generating their forecasts, because the depiction of morning convection was so poor. The HRRRE group had very large areas, which they attribute to the large spread within the HRRRE. The NCAR group decided to discount the guidance in the north of the domain, because of erroneous convection in the northern domain. Instead, they felt more confident in the southern areas where the ensemble was producing supercells. Their group thought that the thermodynamics of the northern area was less conducive to supercellular convection. The group using the mixed physics ensemble from CAPS placed their first area based on where they thought convective initiation would occur, indicating that they thought convection would quickly become severe. Their southern PST was very late to cover any severe threat overnight, but they considered that it might be more of a flood threat (which we do not forecast for in the Spring Forecasting Experiment). The stochastic physics group (another ensemble run by CAPS), on the other hand, had an ensemble which showed almost no signal in the southern area of interest. It also showed a later signal than the other ensembles, contributing to the spread in the time of the first PST.

All of these details came out during the discussion of the PSTs, after participants dove into the data from their subensemble. How did the PSTs do? Here’s a snapshot of the PSTs with reports from 18-22 UTC overlaid:Ideally, all of the reports would fall into the 18-22 UTC contours, which mostly occurred for the expert forecaster and did occur for the HRRRE and Mixed Physics group, although both groups had large areas of false alarm. Here’s a similar image, but showing reports from 22-02 UTC:At this point in time, all groups missed the activity in Kansas, although some groups captured most of the reports within a 22-02 UTC window.

The day after the forecasts, participants are able to go through and give ratings based on the reports that have come in, and choose the group’s forecast that they thought performed the best. Who performed the best for this case? 3 votes for HREFv2, 2 votes each for the HRRRE and the CAPS Stochastic Physics ensemble, and one vote each for the CAPS Mixed Physics and the NCAR ensemble group. Clearly, the complexity of this case provided plenty of nuances to evaluate, and I would bet that more complex cases such as this are on the way….after all, we’ve only just begun Week 2 of the 2018 SFE!

Springing into SFE 2018

Somehow, it’s already that time of year again, when Gulf moisture surges northward, strong upper-level dynamics sweep across the contiguous United States, and forecasters and researchers alike flock to NOAA’s Hazardous Weather Testbed for the annual Spring Forecasting Experiment. The upcoming week looks to be a busy one, with a strong trough poised to move across the United States over the course of the next five days.

This year’s experiment will be quite different than prior experiments, as we’ve listened to the feedback participants have given us. Full details can be found in this year’s Operations Plan,  but some highlights include:

  • Completely redesigned web pages for the forecasts and evaluations, courtesy of our new webmaster Brett Roberts,
  • The capability of participants to dive into the data, with multiple experimental subsets’ data available within the forecast drawing tool,
  • Experimental outlooks driven by ensemble subsets, focusing on high temporal resolution forecasts within the Day 1 period,
  • NEWS-e activities on both the Severe Hazards Desk and the Innovation Desk, allowing all participants to interact daily with the NEWS-e,
  • Larger Chromebooks, with bigger screens to make looking at all this data easier,
  • and of course, a new blog site with more features that I hope to explore throughout the experiment!

Of course, we also will be exploring a number of new concepts in SFE 2018, encompassing both forecast methods and ensemble configuration techniques within the CLUE framework (see Clark et al. 2018 for a formal description of the CLUE from previous years). We’ll  generate probabilistic forecasts of individual hazards over 4-h windows, a timing graphic that communicates when we expect areas to see severe weather, and hourly probabilistic forecasts of severe weather informed by the NEWS-e. We’ll examine the impact of different physics parameterizations in the FV3 model, the impact of stochastic physics perturbations on the WRF-ARW model, new methods of ensemble subsetting based on sensitivity, implementing the MET scorecard for CAM ensembles, and new object-based visualization techniques. This feels like one of the most varied SFEs we’ve had in a while – there’s sure to be something interesting to look at for everyone!

I know I speak for all of the facilitators when I say that we’re excited for this year’s experiment. Whether you’re travelling to Norman over the next five weeks or following along online, we hope that this year’s experiment will provide plenty of interesting results for real-time analysis. Stay tuned!

SFE 2017 Wrap-Up

The 2017 SFE drew to a close a little over a week and a half ago, and on behalf of all of the facilitators, I would like to thank everyone who participated in the experiment and contributed products. Each year, preparation for the next experiment begins nearly immediately after the conclusion of the SFE, and this year was no exception.

This SFE was busier than SFE 2016, in that the Innovation Desk forecast a 15% probability of any severe hazard every day during the experiment – and a 15% verified according to the practically perfect forecasts based on preliminary LSRs. This was despite having a relatively slow final week. Slower weeks typically occur at some point during the experiment, and enhance the operational nature of the experiment. After all, SPC forecasters are working 365 days a year, whatever the weather may be! The Innovation Desk also issued one of their best Day 1 forecasts of the experiment during the final week, successfully creating a gapped 15%. If you read the “Mind the Gap” post, you know the challenges that go into a forecast like this:

This forecast was a giant improvement to the previously-issued Day 2 forecast, which had the axis of convection much too far north:

As for other takeaways from the experiment, the NEWS-e activity introduced an innovative tool for the forecasters, and will likely continue to play a role in future SFEs. Leveraging convection-allowing models at time scales from hours (i.e., NEWS-e, the developmental HRRR) to days (i.e., the CLUE, FVGFS) allows forecasters to understand the current capabilities of those models. Similarly, researchers can see how the models are performing under severe convective conditions and target areas for improvement. A good example of this came from comparing different versions of the FVGFS – two different versions were run with different microphysics schemes, and produced different-looking convective cores. Analyzing the subjective and objective scores post-experiment will allow the developers to improve the forecasts. For anyone interested in keeping up with some of these models, a post-experiment model comparison website has been set up. Under the Deterministic Runs tab, you can look at output for the FVGFS, UK Met Office model, the 3 km NSSL-WRF, and the 3 km NAM from June 5th onward.

Much analysis remains to be done on the subjective and objective data generated during the experiment. Preliminary Fractions Skill Scores (FSSs) for each day:

and aggregated across the days for each hour:

give a preliminary metric of each ensembles’ performance. The FSS looks at the number of gridboxes covered by a phenomenon (in this case, reflectivity) within a certain radius in the forecast and the observations, therefore eliminating the problem of double penalization incurred when a phenomenon is slightly displaced between the forecasts and the observations. The closer the score is to one, the better it is. Now, there are some data drop-outs in this preliminary data, but it still looks as though the SSEO is performing better than most other ensembles. Aggregated scores across the experiment place the SSEO first, with an FSS of .593. The HREFv2, which is essentially an operationalized SSEO with some differences in the members, was second, with an FSS of .592. Other high-performing ensembles include the NCAR ensemble (.580) and the HRRR ensemble (.559). Again, this data is preliminary, and these numbers will likely change as the cases that didn’t run on time in the experiment is rerun.

As for what SFE 2018 will hold, discussions are already underway. Expect to see more of the CLUE, FVGFS, and NEWS-e. A switch-up in how the subjective evaluations are done and revamp of the website is also in the pipeline. Even as the data from SFE 2017 begins to be analyzed, we look forward to SFE 2018 and how we can continue to improve the experiment. Ever onward!

Tags: None