Sarah_Assignment_1

Assignment 1.

Part 1: Create 1D Histogram and the Associated PDF
Choose 1 variable that is interesting to you, and create a histogram of that variable. The dataset chosen is 1000+ years of monthly sea surface height data from a CCSM3 model run. I chose an arbitrary 100 year subset, computed the monthly sea surface height anomalies, and averaged those anomalies (SSHA) over the North Atlantic Ocean (lat 40-55N, lon 310-350E). Below is the time series of the data. y-axis is SSHA in centimeters, x-axis is years. The GrADS script below shows the code for modifying the dataset and the time series. (Note: a similar procedure is done for the ocean surface temperature data used in Part 2)
 * GrADS script for modifying data set.**


 * Matlab script for Part 1.**


 * Quick Plot: hist(mydata, NBINS) -- try different values of NBINS.
 * What is the best number of bins, why? The data I used is a 100 year time series (of monthly data) of sea surface height anomalies (SSHA) averaged over the N. Atlantic Ocean. Therefore most of the values are fairly small ( < 10 cm). I chose 50 for my NBINS in order to decently capture the high variability of the anomalies but also to keep the histogram from being too widely spread.


 * Plot 1. Histogram of Sea Surface Height Anomalies (SSHA) averaged over the N. Atlantic Ocean**

Using the same variable create a PDF (normalized).
 * [histo, binval] = hist(mydata, NBINS); bar(binval, histo) will give the same plot as as hist(mydata, NBINS).
 * Normalize histo so it is a probability distribution, and LABEL the plot properly, with units and axis labels that make the plot clear.


 * Plot 2. Normalized histogram of SSHA averaged over the N. Atlantic**

Compute the first 4 moments using a loop over the histogram bins. mean || -0.0039 || 2.1242e^-16 || The 2nd, 3rd, and 4th moments all compare well. The 1st moment is a little off but is mostly likely due to how the values close to zero are binned around the zero marker.
 * Compare your values to those of the built-in functions acting on the raw data, such as mean(mydata), var(mydata), etc.
 * ==Moment== || ==My calculation== || ==Matlab function== ||
 * 1st:
 * 2nd: variance || 10.8846 || 10.8612 ||
 * 3rd: skewness || -0.1955 || -0.1998 ||
 * 4th: kurtosis || 2.1261 || 2.1286 ||

Part 2: Scatter Plot, 2D Histogram/PDF, covariance, and marginal distributions
Choose 2 (or more) variables that are interesting to you. One can be the variable used in part 1. Create a scatter plot of the 2 variables against each other. Here we introduce the dataset consisting of the surface temperature anomalies averaged over the N. Atlantic (calculated in the GrADS script at the top of the page, Part 1).


 * Matlab script for Part 2.**


 * Plot 3. Scatterplot and histograms of SSHA and Temperature Anomalies**

Make a 2D histogram of the same data in the scatter plot of Part 2, Question 1.
 * Play with the size and number of bins. What is best, and why? The scatterhist function in matlab uses a default NBINS which is calculated by Scott's algorithm. This algorithm selects NBINS based on the standard deviation of each variable and is a fairly good estimate of the best NBINS.


 * Plot 4. 2D histogram**

Normalize your 2D histogram so that it is a true joint PDF (as in Part 1 Question 2).
 * Label the colorbar or contours with appropriate units


 * Plot 5. Normalized 2D histogram**


 * Compute** covariance from the 2D PDF using by looping over the PDF bins, as in Eq. 3.27 and Fig. 3.16, p53-54 in the book.
 * Compare this to the covariance you get from the built-in function, cov, acting on the raw data

My covariance calculation: covariance = 0.4774 Built in matlab function cov returns the covariance matrix : 10.8612 0.5267 0.5267 0.1815 > the covariance from cov is 0.5267 which compares pretty well with the 0.4774 I computed by looping over the PDF bins

Part 3: Study a conditional sample
Use find in Matlab or where in IDL to find which datapoints fell into one or more parts (subsets, or //conditional samples//) of one of the above 1D or 2D distributions. If you choose to just examine one or two quantiles of a 1D distribution (upper tail vs. lower tail, for example), or just one or two sectors (like upper-right vs. lower-left) of a joint (2D or higher) distribution, indicate these tails or sectors on your histogram plot. Ideally, you might find a bimodal or multimodal distribution, so that when you slice it you are sampling truly distinct modes or 'regimes'.
 * I have chosen to look at an upper tail and a lower tail histogram bin from the 2D in part 2. There are 10 different bins. I define the "lower tail" to include the months where the SSHA are the values contained in the 2nd smallest SSHA histogram bin in plot 5. I define the "upper tail" to be the months where the SSHA are the values contained in the 2nd highest SSHA histogram bin in plot 5. I take an index of these times and find the temperature anomalies, salt anomalies, and zonal velocity anomalies that correspond to this distribution. I did not use the smallest and highest histogram bins because data is sparse in those bins. I wanted distributions containing lots of points.


 * Matlab script for part 3:**

For your conditional samples (ideally, distinct modes or 'regimes'), make plots of the other variables you have for those datapoints. This is why we wanted a multivariate dataset for this exercise: to explain or interpret (using the other variables) the story of the modes or regimes or tails or sectors within some 1D or 2D histogram space.


 * Plot 6: For the "lower tail" SSHA distribution, histograms of the SSHA, temperature anomalies, salt anomalies, and zonal velocity (uvel) anomalies for those same months**


 * Plot 7: For the "upper tail" SSHA distribution, histograms of the SSHA, temperature anomalies, salt anomalies, and zonal velocity (uvel) anomalies for those same months**

Part 4: Scientific Interpretation
** Give a scientific interpretation of at least 2 of the above results! ** Looking at the results from part 3:
 * "Lower tail" distribution: By looking only at the months corresponding to the "most-negative" SSHA, we can notice a few things about the distributions of other ocean anomalies present at the same times. As expected, the corresponding ocean surface temperature anomalies are also typically negative when SSHAs are negative. Interestingly, the salt anomalies for this distribution are typically more positive or saltier-than-average. The zonal velocity anomalies tend to be near zero or positive. Perhaps anomalous westerly ocean surface currents bring saltier water to the North Atlantic region from another source region.
 * "Upper tail" distribution: As expected, temperature anomalies are typically positive for positive SSHAs; however, the distribution of temperature anomalies are not nearly as skewed as in the "lower tail" distribution case. Salt anomalies are typically negative, suggesting that the ocean surface waters are diluted. Perhaps this could be explained by either advection of fresher water into the area or anomalous precipitation. The zonal velocity anomaly distribution is fairly normal. Zhang (2008) suggests that SSHAs are a "fingerprint" of the Atlantic Meridional Overturning Circulation (AMOC). The paper suggests that positive SSHAs are typically associated with a stronger AMOC, weakened sub polar gyre, and anomalous convection near the Labrador and/or Nordic Sea. The fresh rain water would propagate southward, dilute the N. Atlantic surface waters, and produce negative salt anomalies as we see here.

Zhang, R. (2008), Coherent surface-subsurface fingerprint of the Atlantic meridional overturning circulation, Geophys. Res. Lett., 35, L20705, doi:10.1029/2008GL035463.