3.1 Data Processing

3.1 Data Processing#

Learning Objectives#

Carry out an EO workflow on a cloud platform
Select suitable data
Chain processes to form an EO processing chain/workflow
Visualize the results

Introduction#

In this lecture we are going to combine the knowledge and hands-on experience we have gathered so far to create a full EO workflow on a cloud platform. We will

define a research question,
choose and load the necessary data sources,
define the data cube to our needs,
use functions to process the data,
visualize the result
and track the resources we are consuming on the platform.

Case Study: Snow Cover in the Alps#

Video content in cooperation with Matteo Dall’Amico (MobyGIS - Waterjade) and Federico Di Paolo (MobyGIS - Waterjade).
“How much snow is stored in that mountain, when will it melt?” We answered this question using EO cloud platforms! Have a look: EO4Alps Snow

Research Question#

Snow serves as a water reservoir and is thus important for any hydrological management activity, such as irrigation planning, drink water supply or hydro power generation. Knowing precisely, when and where snow is present is a critical source of information for these acitivities. Satellite earth observation plays an important role in describing the snow cover, both globally and in local mountain ranges. This is due to it’s ability to sense information throughout space (complete coverage of the globe) and time (repeated measurements). Our goal is to create a time series of the snow covered area of the catchment of interest. We will use this time series to compare it to the run off at the main outlet of the catchment. And study the relationship between snow dynamics and runoff.

Approach#

In this exercise we are going to derive the snow cover in an alpine catchment using Sentinel-2 data. Sentinel-2 carries an optical sensor, it is measuring the reflected light of the earths surface in differenct wavelenghts. At a 20 m spatial resolution and at a 6 day repeat rate. We are using the Green and SWIR bands to calculate the Normalized Difference Snow Index (NDSI). It is calculated as follows:

\[\begin{split} NDSI = \\frac {GREEN - SWIR} {GREEN + SWIR} \end{split}\]

Snow typically has very high visible (VIS) reflectance and very low reflectance in the shortwave infrared (SWIR), a characteristic used to detect snow by distinguishing between snow and most cloud types. The NDSI expresses this phenomenon as a formula. It results in a value between -1 and 1. The higher the value is, the more probable it is that the surface is covered with snow. In order to create a binary snow map we apply a threshold of \(NDSI < 0.4\). This is a commonly used value for discriminating snowcovered and snow free areas. Then we spatially aggregate the snow free and snow covered pixels in the catchment area by summing them up. In order to get the snow covered area of the catchment we multiply the number of snow covered pixels by the pixel resolution. Additionally, we have to deal with cloud cover. We use the Sentinel-2 cloud mask that is provided with the data and exclude all images that have a cloud cover over 25 % in our study area. Ideally we should fill the gaps the clouds generate, since they are introducing uncertainty. Nevertheless, for a first try our approach should be good enough to get a general idea about the snow cover in our area of interest. In the end we receive a time series with the snow covered area in the catchment. The approach we are using is very basic. There are many assumptions and simplifications involved. A critical analysis of the workflow and possible improvements follows in Section 3.3 Validation.

Workflow Description#

get data:
- load_collection()
calculate ndsi:
- filter_bands(), reduce_dimension()
- creates a -1 to 1 map, 1 signifies high probability of snow
create binary snow classification (by threshold):
- gt(), mask(),
- create a binary snow classification: 0 = no snow, 1 = snow
cloud masking:
- eq(), mask()
- create a binary cloud mask using the S2 scene classification
- Apply the mask to the binary snow map: 0 = no snow, 1 = snow, NA = cloud
- This gives us the cloudfree snow covered area for our catchment as an image time series (x, y, time, sca)
catchment statistics - cloud pixels, no snow pixels, snow pixels:
- eq(), gt(), merge_cubes(), aggregate_spatial(), sum()
- create 3 binary data cubes: pixel in catchment, pixel cloud, pixel snow.
- merge the 3 cubes: x, y, time, band (catchment, cloud, snow)
- aggregate_spatial: sum up the values to get the total number of pixels per time step per band (catchment, cloud, snow)
- calculate the percentages of clouds and snow per time step.
filter timeseries according to cloud coverage:
- filter the dates that have cloud coverages > 25%
Plot the time series of the snow covered area in the catchment.

Animated Content: Embed Process Graph#

Exercise#

Now we have covered the most important topics of our use case in theory. Let’s move on to produce some results!

:warning: The applied workflow is a simple approach used for educational reasons to learn how to use EO cloud platforms.

Exercise 3.1 Processing

Quiz#

What is the city at the outlet of the catchment? Answer in the exercise: 31_data_processing.ipynb section ‘Region of Interest’

[(x)] Meran
[( )] Innsbruck
[( )] Grenoble

How many images are available in the time range (“2018-02-01” and “2018-06-30”)? Answer in the exercise: 31_data_processing.ipynb section ‘Calculate Catchment Statistics’, might require you to run some additional code blocks

[( )] 0-20
[( )] 21-40
[(x)] 41-60

How many pixels are in the spatially aggregated data cube? (time * x * y * bands) Answer in the exercise: 31_data_processing.ipynb section ‘Calculate Catchment Statistics’

[( )] 50,000,000 - 100,000,000
[(x)] 100,000,001 - 200,000,000
[( )] 200,000,001 - 500,000,000

How many resources did the computation of the data cube take? Answer in the exercise: 31_data_processing.ipynb section ‘Calculate Catchment Statistics’

[(x)] 0 - 20
[( )] 21 - 100
[( )] 101 - 200

How many snow covered pixels are there across all time steps? Answer in the exercise: 31_data_processing.ipynb section ‘Calculate Catchment Statistics’

[( )] 10,000,000 - 50,000,000
[(x)] 50,000,001 - 100,000,000
[( )] 100,000,001 - 200,000,000

At which time step is the maximum snow cover reached? Answer in the exercise: 31_data_processing.ipynb section ‘Calculate Catchment Statistics’

[( )] 2018-02-13
[( )] 2018-02-21
[(x)] 2018-03-08

What is the number of snow covered pixels on that date? Answer in the exercise: 31_data_processing.ipynb section ‘Calculate Catchment Statistics’

[( )] 4,000,001 - 6,000,000
[( )] 0 - 2,000,000
[(x)] 2,000,001 - 4,000,000

What does that represent in area (km2)? Answer in the exercise: 31_data_processing.ipynb section ‘Calculate Catchment Statistics’

[(x)] 201 - 400
[( )] 0 - 200
[( )] 401 - 600