On this page we explain how to use bivariate splines for forecasting the ground-level ozone concentration at the center of Atlanta using the random surfaces over the entire U.S. domain based on the measurements at various EPA stations from the previous days. The following is a map of all EPA stations over the U.S. continent, where the ground-level ozone concentrations are measured at 24 hours per day.
Assume that the ozone concentration in Atlanta on one day at a particular time is a linear functional of the ozone concentration distribution over the U.S. continent at the same time on the previous day. Also we assume that the linear functional is continuous. These assumptions are reasonable as the concentration in Atlanta is proportional to the concentration distribution over the entire U.S. continent and a small change in the concentration distribution over the U.S. continent results a small change of the concentration at Atlanta under a normal circumstance. Thus, f(X) is the ozone concentration value at the center of Atlanta at one hour of one day and X is the ozone concentration distribution function over entire U.S. continent at the same hour but on the previous day. By the Riesz representation theory,
f(X) = (X, g)
for some L_2 function g, where (*,*) indicates the standard inner product in L_2 space. As f(X) may contain measurement error, we may assume that
f(X) = (X, g) + epsilon
We use a spline space S^1_5(\triangle) to approximate X for every hour
for several days using the penalized least squares splines. Also we
use the spline space S^1_5(\triangle)
to approximate g by an empirical estimate to get S_g.
To forecast the ozone concentration, say on Sept. 12, 2006 at Atlanta, we use
the measurements over 14 days before Sept. 12 to build 13x24 random surfaces
X before Sept 11 and the values of the functional f(X) over 13 days
before Sept. 12,
so that we can compute the approximation S_g.
Once we get S_g, we use it to predict the ozone concentration at the
ground-level at Atlanta on Sept. 12 based on < S_g, X >,
where X is the random surfaces based on the measurements on Sept. 11.
That is, the prediction
We show the prediction values on two different days, together with the measurements based on learning periods of 13 to 20 days. It is easy to see that our spline predictions are very closed to the true measurements. This may be compared with the univariate functional autoregressive ozone concentration prediction method in Damon and Guillas 2002, but here with no exogenous variables. The idea is to consider a time series of functions which correspond to the ozone concentrations at the location of interest over 24 hours, and then build an autoregressive Hilbertian (ARH) model for this time series. The estimation of the autocorrelation operator in a reduced subspace enables predictions. We selected only 5 functional principal components in the dimension reduction process to keep parsimony in our model, due to sample sizes of 13 to 20. As we see on below, the forecasts provided by the bivariate spline strategy outperforms the univariate functional autoregressive method based on the same sizes of samples. See theoretical results in manuscript. In the following we provide with some numerical examples.
We first show the predictions based on our bivariate spline method.
Next we show the ozone concentration at Atlanta using the 1D method explained above. It was considered to be the best method in the paper by Damon and Guillas, 2002.
It is clear that the predictions based on our bivariate spline method are consistent and give a good approximation very closed to the exact measurements. However, the 1D method is not consistent. Without knowing the exact values, we do not know the prediction value from which learning perod gives a good estimate.
Let us present another example. We show the bivariate spline prediction on Sept 13, 2005 in the following figures.
We now show the ozone concentration at Atlanta using the 1D method explained above which was considered to be the best method in the paper by Damon and Guillas, 2002.
Again we can see that the prediction values from our bivariate spline method are consistent for different learning periods while the 1D method is not consistent at all. In some learning periods, the estimated values are very good approximate the maximum ozone concentration. More numerical experiments can be found in Miss Bree Ettinger's dissertation, Spring, 2009.