w2polygon2csv: A way to extract statistics

w2polygon2csv is a very useful WDSS-II tool for researchers.  This post shows an example of how to use it.

The question I want to answer: “What are the polarimetric characteristics of gust fronts”?

The methodology is to:

  1. collect a set of polarimetric radar data that have gust fronts
  2. draw polygons around the gust fronts using wg
  3. Use w2polygon2csv to extract all the pixel data within the polygons
  4. Write a R program to do linear regression of the parameters to identify gust fronts
  5. write a program to draw histograms

Step 1:  Data collection and ingest

I downloaded the necessary Level-II data from NCDC’s HAS website ,  untarred it and put it all in a directory named “raw”.  The directory listing looks like this.

listingNote that I have put all the files (even those from different radars) into the same directory.

While you could ingest them all separately into separate directories for each of the radars, it is more convenient (scripting-wise) to fool downstream algorithms by pretending that all the data comes from the same radar (I’ll assume KOUN).  This is done by the following call to ldm2netcdf:

rm -rf KOUN/netcdf; ldm2netcdf -i `pwd`/raw -o `pwd`/KOUN/netcdf -s KOUN -a -1 -p gz --verbose; makeIndex.pl `pwd`/KOUN/netcdf code_index.xml

After this, you will have a directory called KOUN/netcdf within which will be netcdf files corresponding to Reflectivity, Zdr, etc.

Step 2: Drawing polygons

Start up wg and select Controls | Readout.  Change the output directory in the unmarked textbox (the default is probably /tmp) to the location where you want the polygon files saved:

setupBring up your data product (in this case, Reflectivity). Click on Add and then click around the gustfront to draw a polygon around the gustfront.  If you have multiple gust fronts, use the Done button to finish one polygon and start another. When you are finally done, click Save:

polygonAt this point, a polygon XML file is written to the output directory. It will have a name like GustFront_Reflectivity_20120709-003408.xml where GustFront is the name of the data source in wg, Reflectivity the name of the product and the rest of the filename is the time stamp.

Move to the next image, click Reset to clear the existing polygon from the display and draw the set of polygons corresponding to the new time step.  Repeat for all your data cases (you can skip time steps if you want).

Step 3: Extract pixel data using w2polygon2csv

It is helpful to extract data both from within the polygons (“gustfronts”) and from outside the polygons (“not gustfronts”).  The raw data that I want to collect are Reflectivity, AliasedVelocity, RhoHV and Zdr.  I can also collect any other data as long as I have an algorithm that will produce that data in polar format.

Run w2polygon2csv as follows:

#!/bin/sh

makeIndex.pl `pwd`/KOUN/netcdf code_index.xml

rm -rf csv; mkdir -p csv/gf csv/ngf
INPUTS="Reflectivity:00.50 AliasedVelocity:00.50 RhoHV:00.50 Zdr:00.50"
NUMINPUTS=`echo $INPUTS | wc -w`
w2polygon2csv -i `pwd`/KOUN/netcdf/code_index.xml -I "$INPUTS" -o `pwd`/csv/gf -p `pwd`/polygons -F 60 -N $NUMINPUTS --verbose
w2polygon2csv -i `pwd`/KOUN/netcdf/code_index.xml -I "$INPUTS" -o `pwd`/csv/ngf -p `pwd`/polygons -F 60 -N $NUMINPUTS -V --verbose

After the above commands are run, we will have a set of comma-separated-values (CSV) files in csv/gf corresponding to gustfronts and another set of CSV files corresponding to not-gustfronts in csv/ngf.

 

Step 4:  Create a Linear Regression Model to identify gustfronts

If you are using R, you should combine the two sets of files, making to sure add a new column with the value 1 for gustfront pixels and 0 for not-gustfront pixels.  Like this:

head -q csv/gf/* | head -1 | awk '{print $0",GustFront"}' > csv/train.csv
cat csv/gf/*.csv | grep -v Time | awk '{print $0",1"}'  >> csv/train.csv
cat csv/ngf/*.csv | grep -v Time | awk '{print $0",0"}' >> csv/train.csv

The very first line (head) simply carries over the column information from the original CSV files.   The reason for the “grep -v” on the second and third lines is to remove this header line from all subsequent output.  At this point, you will now have a “train.csv” with all the pixel data and you can use R’s read.csv to read it and process it.

The R program to do the linear regression:

data <- read.csv("csv/train.csv")
data <- data[ data$Reflectivity < 30, ]
data <- data[ data$RhoHV < 0.9, ]

numgustfronts <- length( data[data$GustFront > 0.5,1] )
numnegatives  <- length( data[data$GustFront < 0.5,1] )
weights <- data$RhoHV
weights[ data$GustFront > 0.5 ] <- 0.5/numgustfronts
weights[ data$GustFront < 0.5 ] <- 0.5/numnegatives

model2 <- lm( GustFront ~ abs(AliasedVelocity - 6) + ChangeInReflectivity + abs(Kdp_approx) + abs(Reflectivity - 13) + RhoHV + abs(Zdr), data=data, weights=weights )

You may notice that I have additional columns in my output — I used existing WDSS-II tools (beyond the scope of this blog) to produce a ChangeInReflectivity (over time) and to approximate the Kdp.

Step 5: Draw histograms using Python

You could use R itself to draw histograms, but these days, I prefer using Python, and so I will read the individual files from within the Python program itself. My Python program, for simplicity, will read and plot only the gustfront csv files (repeat for the non-gustfront data).

The first part of the Python program reads in the data:

#!/usr/bin/python
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os

(VEL,DELREF,FZAGG,KDP,REF,RHOHV,ZDR,TIME,LAT,LON,NUMDATACOLS)=range(0,11)

PREFIX="gf"; TITLE="gustfronts"
#PREFIX="ngf"; TITLE="non-gustfronts"

indir="/scratch2/lakshman/gustfront/csv/" + PREFIX

# read all the files in the above directory ...
files = os.listdir(indir)
alldata = np.empty([0,NUMDATACOLS])
for file in files:
  try:
     # read file, but convert last-but-two element (which is time) to zero
     data = np.loadtxt(indir + "/" + file, comments="#", delimiter=",", unpack=False, skiprows=1, converters={(NUMDATACOLS-3): lambda s: 0} )
     # remove any points that do not have a valid value
     sel = data[:,0] > -999
     for col in xrange(1,NUMDATACOLS):
        sel = np.logical_and(sel, data[:,col] > -999)
     sel = np.logical_and(sel, data[:,RHOHV] < 0.99 )  # rhohv valid value < 1
     data = data[sel]
     print np.shape(data)
     alldata = np.row_stack( (alldata, data) )
  except:
     print "Ignoring " + file

As I read in the data, I append it to an array called “alldata”, so at the end of the loop, my “alldata” contains all the valid (non-missing) data in the CSV files.

The next step is to define two Python functions, one to compute the nth percentile of the data and the other to plot a nice-looking histogram:

def percentile(a, P):
  # sort a
  a = np.sort(a)
  n = int( round(P*len(a) + 0.5) )
  if n > 1:
     return a[n-2]
  else:
     return a[0]

def plothist(a, title):
  global nrows, ncols, plotno
  axes = plt.subplot(nrows,ncols, plotno)
  weights = np.ones_like(a) / len(a)
  axes.hist(a, numbins, facecolor='green', normed=False, alpha=0.5, weights=weights)
  p = percentile( np.abs(a), 0.1 )
  plt.plot( [p,p], axes.get_ylim(), 'b--' )
  pos = axes.get_ylim()[0] * 0.25 + axes.get_ylim()[1] * 0.75
  axes.annotate( np.around(p,2),
                 xy=(p, pos), xycoords='data',
                 xytext=(50,30), textcoords='offset points',
                 arrowprops=dict(arrowstyle="->",
                      connectionstyle="angle3,angleA=90,angleB=0") )
  p = percentile( np.abs(a), 0.5 )
  plt.plot( [p,p], axes.get_ylim(), 'b--' )
  pos = axes.get_ylim()[0] * 0.75 + axes.get_ylim()[1] * 0.25
  axes.annotate( np.around(p,2),
                 xy=(p, pos), xycoords='data',
                 xytext=(50,30), textcoords='offset points',
                 arrowprops=dict(arrowstyle="->",
                      connectionstyle="angle3,angleA=90,angleB=0") )
  p = percentile( np.abs(a), 0.9 )
  plt.plot( [p,p], axes.get_ylim(), 'b--' )
  axes.annotate( np.around(p,2),
                 xy=(p, np.mean(axes.get_ylim())), xycoords='data',
                 xytext=(-50,30), textcoords='offset points',
                 arrowprops=dict(arrowstyle="->",
                      connectionstyle="angle3,angleA=90,angleB=0") )
  axes.set_ylabel("Frequency")
  axes.set_xlabel(title)
  #plt.title("Distribution of " + title)
  plotno = plotno + 1

Then, simply call the histogram function once per variable:

# process the data
data = alldata
print "Total: ",np.shape(data)

plotno=1
nrows=3
ncols=3
numbins=20

plt.figure(figsize=(15,12))

plothist( np.abs(data[:,VEL]), '|Velocity| (m/s)')
plothist( data[:,REF], 'Reflectivity (dBZ)' )
plothist( data[:,RHOHV], 'RhoHV' )
plothist( np.abs(data[:,ZDR]), '|Zdr| (dB)')
plothist( np.abs(data[:,KDP]), '|Kdp| (deg)')
plothist( data[:,DELREF], 'Change in Reflectivity (dBZ)' )
plothist( data[:,FZAGG], 'Fuzzy Aggregate')

fig = plt.gcf()
fig.suptitle("Polarimetric characteristics of " + TITLE, fontsize=24)
#plt.show()
plt.savefig( PREFIX + 'stats.png') #, bbox_inches='tight')

The final output looks like this:

gfstats

 

Environmental information in w2dualpol

In order to run the hydrometer classification algorithm (HCA) and melting layer detection algorithm (MLDA) in w2dualpol, a “first guess” at the melting layer is required. Previously, the same default first guess was used for all radars. Obviously this solution is not ideal, as on any given day, the melting layer is likely to vary wildly across different radar sites (not to mention that at any given radar site, the melting layer is likely to vary wildly across different days).

For a better solution, you can now pull environmental information (specifically, the 0C wet bulb temperature height) into w2dualpol for a more realistic first guess at the melting layer. This first guess is provided through the use of a SoundingTable. The SoundingTable can be created in one of three ways: (1) The WDSS-II program nse can create this from a RUC analysis field.  (2) The ingest_sounding.pl script reads and converts sounding data from the U. Wyoming website into WDSS-II’s XML format. (3) You can write your own XML table file.

Once you have your SoundingTable, you will want to be sure that it is included in your RadarIndex. Then, in the command for w2dualpol, set the -S flag, and from there the algorithm will use the SoundingTable information to create a first guess at a melting layer. Additionally, the melting layer found from the SoundingTable will be used in instances when the data is not sufficient for the MLDA to run.

It should be noted that once the SoundingTable is greater than 1 day older than the radar data being processed, the SoundingTable will be aged off and the default melting layer will be used. On a related note, we have altered the algorithm such that if more than two hours pass after the MLDA has run, the algorithm will revert to using the melting layer found in the SoundingTable.

A few final notes; If a SoundingTable is present in the RadarIndex but you do not wish to use it, set -S junk and the default value will be used for the melting layer. Additionally, if no SoundingTable is available, there is no need to set -S.

User defined resolution in w2birddensity

A secondary use of weather radars is to detect biological scatterers such as birds, bats, and insects. When birds are detected, the echoes returned to the radar can be used to estimate the density of birds on the ground. This information is valuable in a number of situations, such as when attempting to determine where birds stop while in their migratory journey.

The algorithm w2birddensity, which is based on an algorithm described in detail here, estimates the bird density from radar reflectivity data. This is done by computing a vertical profile of reflectivity (VPR) and making adjustments to the raw reflectivity field based off of the VPR.

Previously, w2birddensity was carried out at a fixed resolution, but we have now given users the option of defining the resolution of their radar data with the -R flag. As with many of our other algorithms, just specify the gate size in km, the radial size in degrees, and the range in km, as shown below. Additionally, we have given the user the option of specifying that they would like more then the default of 5 tilts processed. This is done by setting the -n flag to the number of tilts desired.

w2birddensity -i /localdata/birddensity/code_index.xml -o /tmp/birds -s KAKQ -E ~/WDSS2/gtopo30/radars -R 0.25x0.5x100 -n 5 --verbose
birdSR

Bird reflectivity from KIWA at super-resolution.

BirdLEG

Bird reflectivity from KIWA at legacy resolution.