The general flow of the algorithm that is used to compute time series is quite straightfoward, and shown in the picture below.
There are roughly 4 steps:
- Generate masks that contain a unique integer for each combination of cci landcover and GAUL 1 administrative region. This needs to be done only once.
- For each tile in a biopar product, group all pixels based on the id in the corresponding mask pixel and determine the average pixel value.
- Recombine all of the computed averages into 1 big table.
- Derive other higher level tables, e.g. biopar averages for GAUL1 administrative regions, aggregated over all landcovers.
CCI mask resolution
CCI land cover masks have a resolution of 300m. This poses a practical issue, as a lot of Biopar products are only available at 1km resolution. The most obvious solution would be to downsample the CCI masks to 1km resolution, but this would result in a loss of accuracy, as the downsampled pixel could have a landcover that is not representative for the pixels in the 300m mask.
Our approach is to keep the masks at 300m resolution, but to increase the resolution of the products to 300m. For instance, one pixel in the source image with value 0.6 would become 9 pixels in the target image, also with value 0.6. As an added benefit, this also increases the resolution that is used to determine in which GAUL region a given pixel lies, which is mostly beneficial for smaller regions.
Due to the presence of clouds, it is quite common for products to contain nodata pixels. These pixels are simply ignored, and we keep track of the number of valid pixels and the total number of pixels for a given region. If a pixel has at least 1 valid pixel, then the value for that region will be the value of that single pixel. This way of working does imply that sometimes the value of a region can depend on very few pixels, which can cause the timeseries to be less smooth, or to exceed the long term minima and maxima at a given point in time.
Inaccuracies caused by clouded pixels may also be propagated when the aggregated values for e.g. a region or computed based on the values of its landcovers. This is caused by the fact that aggregated values are computed using a weighted average, where the area of a given region is used as it's weight. So if an inaccurate value happens to have a large weight because it represents a large area, the inaccuracy will be propagated to the higher level.
The effect of having few data available if shown in this fAPAR graph, for a region in Benin. Note how the long term minima and maxima also converge, as they are based on fewer pixels. Alongside it, the actual 1km fAPAR product for 13 september 2013 is shown, which illustrates how little useful data is available. Nodata pixels are white in this image.
Do note that an other approach, where for instance the number of valid pixels would be used as the weight for a given value, would also be inaccurate, hence this would not be better. The best solution for this problem will be provided by the V2 Copernicus products, where more advanced gapfilling algorithms are used.
It can also occur that all pixels for a given region or clouded. In this case, the timeseries will have a gap. This can even be the case for entire countries.