Wageningen University
& ICOS Netherlands


Documentation - CTE2018
Biosphere Oceans Observations Fires Fossil Fuel TM5 Nested Model Assimilation
To learn more about a CarbonTracker Europe component, click on one of the above images.
Or download the full PDF version for convenience.

Ensemble Data Assimilation [go to top]
1.   Introduction
Data assimilation is the name of a process by which observations of the 'state' of a system help to constrain the behavior of the system in time. An example of one of the earliest applications of data assimilation is the system in which the trajectory of a flying rocket is constantly (and rapidly) adjusted based on information of its current position to guide it to its exact final destination. Another example of data assimilation is a weather model that gets updated every few hours with measurements of temperature and other variables, to improve the accuracy of its forecast for the next day, and the next, and the next. Data assimilation is usually a cyclical process, as estimates get refined over time as more observations about the "truth" become available. Mathematically, data assimilation can be done with any number of techniques. For large systems, so-called variational and ensemble techniques have gained most popularity. Because of the size and complexity of the systems studied in most fields, data assimilation projects inevitably include supercomputers that model the known physics of a system. Success in guiding these models in time often depends strongly on the number of observations available to inform on the true system state.

In CarbonTracker, the model that describes the system contains relatively simple descriptions of biospheric and oceanic CO2 exchange, as well as fossil fuel and fire emissions. In time, we alter the behavior of this model by adjusting a set of parameters as described in the next section.

2.   Detailed Description
The four surface flux modules drive instantaneous CO2 fluxes in CarbonTracker according to:

F(x, y, t) = λ(x,y,t) • Fbio(x, y, t) + λ(x,y,t) • Foce(x, y, t) + Fff(x, y, t) + Ffire(x, y, t)

Where λ represents a set of linear scaling factors applied to the fluxes, to be estimated in the assimilation. These scaling factors are the final product of our assimilation and together with the modules determine the fluxes we present in CarbonTracker. Note that no scaling factors are applied to the fossil fuel and fire modules.

2.1   Land-surface classification
The scaling factors λ are estimated for each week and assumed constant over this period. Each scaling factor is associated with a particular gridbox of the global domain. We chose an approach in which the ocean grid boxes are combined into 30 large basins encompassing large-scale ocean circulation features, as in the TransCom inversion study (e.g. Gurney et al., [2002]). The terrestrial biosphere grid boxes are combined up according to ecosystem type as well as geographical location. Thereto, each of the 11 TransCom land regions contains a maximum of 19 ecosystem types summarized in the table below for Europe.

Ecosystem types considered on 1x1 degree for the terrestrial flux inversions is based on Olson, [1992]. Note that we have adjusted the original 29 categories into only 19 regions. This was done mainly to fill the unused categories 16,17, and 18, and to group the similar (from our perspective) categories 23-26+29. The table below shows each vegetation category considered. Percentages indicate the area associated with each category for Europe rounded to one decimal.

Each 1x1 degree pixel of our domain was assigned one of the categories above bases on the Olson category that was most prevalent in the 0.5x0.5 degree underlying area.

2.2   Ensemble Size and Localization
The ensemble system used to solve for the scalar multiplication factors is similar to that in Peters et al. [2005] and based on the square root ensemble Kalman filter of Whitaker and Hamill, [2002]. We have restricted the length of the smoother window to only five weeks as we found the derived flux patterns within Europe and North America to be robustly resolved well within that time. We caution the CarbonTracker users that although the North American and European flux results were found to be robust after five weeks, regions of the world with less dense observational coverage (the tropics, Southern Hemisphere, and parts of Asia) are likely to be poorly observable even after more than a month of transport and therefore less robustly resolved. Although longer assimilation windows, or long prior covariance length-scales, could potentially help to constrain larger scale emission totals from such areas, we focus our analysis here on a region more directly constrained by real atmospheric observations.

Ensemble statistics are created from 150 ensemble members, each with its own background CO2 concentration field to represent the time history (and thus covariances) of the filter. In contrast to our earlier system design, we currently do not apply any localization to the statevector.

2.3   Dynamical Model
In CarbonTracker, the dynamical model is applied to the mean parameter values λ as:

λ tb = (λ  t-2a + λ  t-1 a + λ  p  )   ⁄   3.0

Where "a" refers to analyzed quantities from previous steps, "b" refers to the background values for the new step, and "p" refers to real a-priori determined values that are fixed in time and chosen as part of the inversion set-up. Physically, this model describes that parameter values λ for a new time step are chosen as a combination between optimized values from the two previous time steps, and a fixed prior value. This operation is similar to the simple persistence forecast used in Peters et al. [2005], but represents a smoothing over three time steps thus dampening variations in the forecast of λ b in time. The inclusion of the prior term λ p acts as a regularization [Baker et al., 2006] and ensures that the parameters in our system will eventually revert back to predetermined prior values when there is no information coming from the observations. Note that our dynamical model equation does not include an error term on the dynamical model, for the simple reason that we don't know the error of this model. This is reflected in the treatment of covariance, which is always set to a prior covariance structure and not forecast with our dynamical model.

2.4   Covariance Structure
Prior values for λ p are all 1.0 to yield fluxes that are unchanged from their values predicted in our modules. The prior covariance structure Pp describes the magnitude of the uncertainty on each parameter, plus their correlation in space.

In each of these regions on the northern hemisphere, individual λ(x,y) parameters are coupled through an isentropic covariance structure which makes two boxes i and j at a distance d of each other have a covariance C of

C = 0.64• exp(-d/L).

In this formula the covariance length scale L varies across the globe. Over Boral and Temperate North America where the observation network is relatively dense, L=300km, but in Boreal and Temperate Asia the number of observations constrains a much smaller number of parameters individually and we chose L=1000km. In Europe, with its strongly heterogeneous land-use and land management and large volume of observations available we took L=200km. In the rest of the world, the length scale is taken infiniely large, coupling fully all grid boxes and associated λ's in the tropics and southern hemisphere.

The figure shows ecoregions for Europe (click here for global land ecoregions). Note that there is currently no requirement for ecoregions to be contiguous, and a single scaling factor can be applied to the same vegetation type on both sides of a continent.

Theoretically, this approach leads to a total number of 9835 optimizable scaling factors λ each week, but the actual number is smaller since not every ecosystem type is represented in each TransCom region, and because we decided not to optimize parameters for ice-covered regions, inland water bodies, and desert. The total flux coming out of these last regions is negligibly small. It is important to note that even though many parameters are available to scale the fluxes, the imposed covariance structure reduces the number of degrees of freedom to about 1100 each week.

Furthermore, all ecosystems within tropical TransCom regions are coupled decreasing exponentially with distance since we do not believe the current observing network can constrain tropical fluxes on sub-continental scales, and want to prevent large dipoles to occur in the tropics.

In our standard assimilation, the chosen standard deviation is 80% on land parameters, and 40% on ocean parameters. This reflects more prior confidence in the ocean fluxes than in terrestrial fluxes, as is assumed often in inversion studies and partly reflects the lower variability and larger homogeneity of the ocean fluxes. All parameters have the same variance within the land or ocean domain. Because the parameters multiply the net-flux though, ecosystems with larger weekly mean net fluxes have a larger variance in absolute flux magnitude.

3.   Further Reading




Copyright Wageningen University, October 2018