Assignment

The purpose of this assignment is to use your own dataset to solidify some of the themes learned in class in chapter 3, using a dataset that is relevant to your research interests.

or you can choose something we have lying around:

 * netCDF:
 * How to trim off pieces of large netCDF files easily: http://mpo581-spring2012.wikispaces.com/netCDF+Help
 * Model Data with multiple variables (CCSM3 or 4 data can be found on the vis1/ENSO computers, and includes precipitation, TS, u;v winds, etc.)
 * Reanalysis Data with multiple variables (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html)
 * textual data
 * A 180,000 year sediment core with many chemical elements: http://www.rsmas.miami.edu/users/bmapes/teaching/MPO581_2011/Peterson_mudcore_chem.txt
 * Atlantic hurricanes list, buoy time series: http://mpo581-hw2.wikispaces.com/Multivariate+datasets
 * Or any other dataset you may be interested in using, as long as it has multiple variables ! (or even the same variable at multiple locations).

Part 1: Create 1D Histogram and the Associated PDF

 * 1) Choose 1 variable that is interesting to you, and create a histogram of that variable.
 * Quick Plot: hist(mydata, NBINS) -- try different values of NBINS.
 * What is the best number of bins, why?
 * 1) Using the same variable create a PDF (normalized).
 * [histo, binval] = hist(mydata, NBINS); bar(binval, histo) will give the same plot as as hist(mydata, NBINS).
 * Normalize histo so it is a probability distribution, and LABEL the plot properly, with units and axis labels that make the plot clear.
 * 1) Compute the first 4 moments using a loop over the histogram bins.
 * Compare your values to those of the built-in functions acting on the raw data, such as mean(mydata), var(mydata), etc.

Part 2: Scatter Plot, 2D Histogram/PDF, covariance, and marginal distributions

 * 1) Choose 2 (or more) variables that are interesting to you. One can be the variable used in part 1. Create a scatter plot of the 2 variables against each other.
 * Try scatterhist, or marginhist.m for octave users (you may have to google and download it).
 * Some additional tools are at http://mpo581-hw2.wikispaces.com/Multivariate+display+tools
 * 1) Make a 2D histogram of the same data in the scatter plot of Part 2, Question 1.
 * Try hist2d
 * Play with the size and number of bins. What is best, and why?
 * 1) Normalize your 2D histogram so that it is a true joint PDF (as in Part 1 Question 2).
 * Label the colorbar or contours with appropriate units
 * 1) Compute covariance from the 2D PDF using by looping over the PDF bins, as in Eq. 3.27 and Fig. 3.16, p53-54 in the book.
 * Compare this to the covariance you get from the built-in function, cov, acting on the raw data

Part 3: Study a conditional sample

 * 1) Use find in Matlab or where in IDL to find which datapoints fell into one or more parts (subsets, or //conditional samples//) of one of the above 1D or 2D distributions. If you choose to just examine one or two quantiles of a 1D distribution (upper tail vs. lower tail, for example), or just one or two sectors (like upper-right vs. lower-left) of a joint (2D or higher) distribution, indicate these tails or sectors on your histogram plot. Ideally, you might find a bimodal or multimodal distribution, so that when you slice it you are sampling truly distinct modes or 'regimes'.
 * 2) For your conditional samples (ideally, distinct modes or 'regimes'), make plots of the other variables you have for those datapoints. This is why we wanted a multivariate dataset for this exercise: to explain or interpret (using the other variables) the story of the modes or regimes or tails or sectors within some 1D or 2D histogram space.
 * Actually, instead of making multivariable plots of one or two conditional samples, consider (optional) making averages of the other variables for the conditional sample defined by in EVERY bin of a histogram. This would be essentially give an Importance-weighting function for the distribution (where those other variables define the Importance).
 * It is a little like colorizing the points on a scatterplot based on a 3rd variable, except that you 'average' that third, color variable within each bin.
 * Here is an example for a 1D histogram: [|Figure] from this [|Article]
 * Here is another example for 2D histograms: simple number, area weighted, and volume weighted histograms for clouds. [|Clouds_weighted_2dhistograms.pdf]
 * For power tools to do this, see the second row of the table at http://mpo581-hw2.wikispaces.com/Multivariate+display+tools for IDL and Matlab tools.

Part 4: Scientific Interpretation

 * Give a scientific interpretation of at least 2 of the above results! **