More Data Visualization

As jimmyc touched on in his last post, one of the struggles facing the Hazardous Weather Testbed is how to visualize the incredibly large datasets that are being generated. With well over 60 model runs available to HWT Experimental Forecast Program participants, the ability to synthesize large volumes of data very quickly is a must. Historically we have utilized a meteorological visualization package known as NAWIPS, which is the same software that the Storm Prediction Center uses for their operations. Unfortunately, NAWIPS was not designed with the idea it would be handling the large datasets that are currently being generated.

To help mitigate this, we utilized the Internet as much as possible. One webpage that I put together is a highly dynamical, CI forecast and observations webpage. This webpage allowed users to create 3, 4, 6, or 9 panel plots, with CI probabilities of any of 28 ensemble members, NSSL-WRF, or observations. Furthermore, users had the ability to overlay the raw CI points from any of the ensemble members, NSSL-WRF, or observations to see how the points contributed to the underlying probabilities. We even enabled it so that users could overlay the human forecasts to see how it compared to any of the numerical guidance or observations. This webpage turned out to be a huge hit with visitors, not only because it allowed for quick visualization of a large amount of data, but because it also allowed visitors to interrogate the ensemble from anywhere — not just in the HWT.

One of the things we could do with this website is evaluate the performance of individual members of the ensemble. We could also evaluate how varying the PBL schemes affected the probabilities of CI. Again, the website is a great way to sift through a large amount of data in a relatively short amount of time.

Visualization

We all did some web displays for various components of the CI desk. I built a few web displays based on object identification of precipitation areas. I counted up the objects per hour for all ensemble or all physics members (separate web pages) in order to 1) rapidly visualize the entire membership and 2) to add a non-map based perspective of when interesting things are happening. It also allows the full perspective of the variability in time, and variability of position and size of the objects.

The goal was to examine the models in multiple ways simultaneously and still investigate the individual members. This, in theory, should be more satisfying for forecasters as they get more comfortable with ensemble probabilities.  It could alleviate data overload by giving a focused look at select variables within the ensemble. Variables that already have meaning and implied depth. Information that is easy to extract and reference.

The basic idea as implemented was to show the object count chart and upon mousing over a grid cell you can call up a map of the area with the precipitation field. At the upper and right most axes, you call up an animation of either all the models at a specific time OR one model at all times. The same concept was applied to updraft helicity.

I applied the same idea to the convection initiation points only this time there were no objects, just the raw number of points. I had not had time to visualize this prior to the experiment, so we used this as a way to compare two of the definitions in test mode.

The ideas were great, but in the end there were a few issues. The graphics were good in some instances because we started with no precipitation or updraft helicity or CI points. But if the region already had storms then interpretation was difficult, at least in terms of the object counts. This was a big issue with the CI points, especially as the counts increased well above 400, for a 400 by 400 km sub domain.

Another display I worked hard on was the so-called pdf generator. The idea was to use the ensemble to reproduce what we were doing, namely putting our CI point on the map where we thought the first storm would be. Great in principle, but automating this was problematic because we could choose our time window o fit the situation of the day. The other complication was that sometimes we had to make our domain small or big, depending on how much pre-existing convection was around. This happened quite frequently so the graphic was less applicable, but still very appealing. It will take some refinement but I think we can make this a part of the verification of our human forecasts.

I found this type of web display to be very useful and very quick. It also allows us to change our perspective from just data mining to information mining and consequently to think more about visualization of the forecast data. There is much work to be done in this regard and I hope some of these ideas can be further built upon for visualization and Information Mining so they can be more relevant to forecasters.

Opportunities

Every Monday we get a new group of participants and this week we spent time discussing all the issues from their perspective. From an aviation, airport, and airplane perspective to towering cumulus from the mountains. We then discussed the forecast implications from verification, to model data mining, to practical use of forecast data and again learned the lesson that forecasters already know: you only have so much time before your forecast is due and it better contain the answer!

Our forecast has evolved into a 3 hour categorical product of convection initiation where we determine the domain and the time window. We then go a step further and forecast individually a location where we think the first storm will be in our domain during our time period. We then forecast the time we think it will occur and our uncertainty. Then we assign our confidence that CI will occur within 25 miles of our point. It might sound easy, but it takes some serious practice to spin up 10 people on observations, current and experimental guidance, AND lots of discussion about the scenario or scenarios at play. We have a pretty easy going group, but we do engage in negotiations about where to draw our consensus categorical risk and over what time period we are choosing. It is a great experience to hear everyone’s interpretation of the uncertainty on any number of factors at play.

Monday was a great practice day with a hurried morning forecast, and terrain induced CI forecast in the afternoon. Tuesday we were off to a good start with all products nominal, 10+ forecasters, and action back out on the Plains with plenty of uncertainty. The highlights from today included the introduction of ensemble soundings via the BUFKIT display (Thanks Patrick Marsh!). This garnered a lot of attention and will be yet another new, exciting, and valuable visualization tool. The aviation forecasters shared tremendous insights about their experiences and even showed a movie of what they face as airplane traffic gets shuffled around thunderstorms. It was a glimpse of exactly the sorts of problems we hope to address with these experimental models.

These problems are all associated with the CI problem, on every scale (cloud, storm, squall line scales). The movie highlighted the issue of where and when new convection would fill in the gaps, or simple occupy more available air space, or block and airport arrival, or when convection would begin to fizzle. Addressing these issues is part of the challenge and developing guidance relies almost exclusively on how we define convection initiation in the models and observations. We have some great guidance and it is clear that as we address more of the challenges of generic CI we will require even more guidance to account for the sheer number of possibilities of where (dx, dy and dz), when, how, and if CI occurs.

As an example, we issued our first night time elevated convection forecast. As it turns out, we could be verifying this by observing rain in OK tonight. Our experimental guidance was inadequate as we have very little data aloft except soundings from the fine resolution models. So we looked at more regular models while using what was available from the fine resolution models, like reflectivity and CI points. This highlights a unique operational challenge that we all face: Data overload and time intensive information extraction. The forecast verification for tonight should be quite revealing and should provide more insight than I am prepared to discuss this evening.