Matt_assignment_1

My Data :
I chose to look at surface temperature on the inner nest of our WRF domain. Here is an example of a 2D plot of this data.



Part 1: Create 1D Histogram and the Associated PDF

 * 1) Choose 1 variable that is interesting to you, and create a histogram of that variable.
 * Quick Plot: hist(mydata, NBINS) -- try different values of NBINS.
 * What is the best number of bins, why?
 * 1) Using the same variable create a PDF (normalized).
 * [histo, binval] = hist(mydata, NBINS); bar(binval, histo) will give the same plot as as hist(mydata, NBINS).
 * Normalize histo so it is a probability distribution, and LABEL the plot properly, with units and axis labels that make the plot clear.
 * 1) Compute the first 4 moments using a loop over the histogram bins.
 * Compare your values to those of the built-in functions acting on the raw data, such as mean(mydata), var(mydata), etc.

Part 1 Code :

1. The best number of bins was 61 as determined by Scott ( Scott, David W. (1979). "On optimal and data-based histograms". //Biometrika// **66** (3): 605–610.). This was a good number as I had 240x240 observations so a slightly larger number of bins allows the reader to see some detail.



2.

2b.

mean = 26.1572 matlab says the mean = 26.1575

variance = 0.9534 matlab says the variance = 0.95296

skew = -1.1293 matlab says the skew = -1.2157

kurtosis = 3.5393 matlab says the kurtosis = 3.8987

Part 2: Scatter Plot, 2D Histogram/PDF, covariance, and marginal distributions

 * 1) Choose 2 (or more) variables that are interesting to you. One can be the variable used in part 1. Create a scatter plot of the 2 variables against each other.
 * Try scatterhist, or marginhist.m for octave users (you may have to google and download it).
 * Some additional tools are at http://mpo581-hw2.wikispaces.com/Multivariate+display+tools
 * 1) Make a 2D histogram of the same data in the scatter plot of Part 2, Question 1.
 * Try hist2d
 * Play with the size and number of bins. What is best, and why?
 * 1) Normalize your 2D histogram so that it is a true joint PDF (as in Part 1 Question 2).
 * Label the colorbar or contours with appropriate units
 * 1) Compute covariance from the 2D PDF using by looping over the PDF bins, as in Eq. 3.27 and Fig. 3.16, p53-54 in the book.
 * Compare this to the covariance you get from the built-in function, cov, acting on the raw data

Part 2 Code :

1.



2. 20 bins seemed more appropriate here since using 61 bins like I did before in part 1 really "jumbled" the plot making it hard to see features.



3.



4.

My covariance = 0.0040605

Matlab covariance = 0.0045117

The difference was = 0.00045117

Part 3: Study a conditional sample

 * 1) Use find in Matlab or where in IDL to find which datapoints fell into one or more parts (subsets, or //conditional samples//) of one of the above 1D or 2D distributions. If you choose to just examine one or two quantiles of a 1D distribution (upper tail vs. lower tail, for example), or just one or two sectors (like upper-right vs. lower-left) of a joint (2D or higher) distribution, indicate these tails or sectors on your histogram plot. Ideally, you might find a bimodal or multimodal distribution, so that when you slice it you are sampling truly distinct modes or 'regimes'.
 * 2) For your conditional samples (ideally, distinct modes or 'regimes'), make plots of the other variables you have for those datapoints. This is why we wanted a multivariate dataset for this exercise: to explain or interpret (using the other variables) the story of the modes or regimes or tails or sectors within some 1D or 2D histogram space.
 * Actually, instead of making multivariable plots of one or two conditional samples, consider (optional) making averages of the other variables for the conditional sample defined by in EVERY bin of a histogram. This would be essentially give an Importance-weighting function for the distribution (where those other variables define the Importance).
 * It is a little like colorizing the points on a scatterplot based on a 3rd variable, except that you 'average' that third, color variable within each bin.
 * Here is an example for a 1D histogram: [|Figure] from this [|Article]
 * For power tools to do this, see the second row of the table at http://mpo581-hw2.wikispaces.com/Multivariate+display+tools for IDL and Matlab tools.

Part 3 Code :

1.The green area represents the top 10 percentile of all mixing ratio values of the data while the red area is the bottom 10%.



2.





Part 4: Scientific Interpretation

 * Give a scientific interpretation of at least 2 of the above results! **

1) Looking at the scatterhist plot (Part 2, Figure 1), it appears that mixing ratio exhibits more of a normal distribution as compared to temperature which is skewed towards higher temperatures. This makes sense since most of the model domain is dominated by the warmer temperatures and only regions experiencing precipitation are being significantly cooled. This creates a sharp cut off on the right side of the temperature histogram while a broader range of cooler temperatures occurs on the left side of the distribution. Another notable feature of the scatterhist plot is the positive correlation between mixing ratio and temperature. This is not surprising at all because the Clausius-Clapeyron equation suggests that warmer air will typically contain higher concentrations of water vapor.

2) Looking at the last figure (Part 3, Figure 2b), relationships between mixing ratio, temperature, u wind, and v wind in the upper 10th percentile can be examined. As suggested in the previous section, as mixing ratio increases, temperature increases. It is interesting that as mixing ratio quickly falls off, the corresponding temperature in this range (of mixing ratio) gradually increases. A strong relationship between winds and mixing ratio is not evident in this upper 10% range. It appears that both u and v have a slight bias towards positive values, suggesting that regions of highest moisture (and perhaps precipitation) were propagating northwestward.