Earth Observation cloud platforms#

Learning Objectives#

  • Understand why using a platform is useful

  • Differentiate platform offerings

  • Get to know the components and building blocks of a platform

Why do we need EO cloud platforms?#

Introduction to EO Cloud Platforms

Video content in cooperation with Jeroen Dries (VITO).
Numbers based on ESA Annual Sentinel Data Access Report 2022

Traditional approaches for the analysis of Earth Observation (EO) data typically involve several steps, including data discovery, data download, data pre-processing, and data analysis. Especially when working with multiple datasets, handling data discovery, download, and access is a tremendous task, where users need to navigate through different interfaces, adhere to varying access requirements, and manage the heterogeneity of data formats. This approach is often time-consuming and requires significant effort to aggregate and harmonize datasets from different providers for comprehensive analysis.

Figure: EO research withouth cloud facilities.

EO data volume and the limits of your computer#

In the field of Earth Observation, satellite missions like Sentinel-2 provide vast amounts of data that play a crucial role in various applications, including environmental monitoring, land cover mapping, and climate analysis. Understanding the volume of data involved in an analysis is critical for efficient data processing. EO datasets can span terabytes and petabytes, making it impractical to store, manage, and process them entirely on a local computer.

The increasing availability of vast amounts of EO data from multiple satellites presents challenges in terms of the time required for data download and pre-processing on individual computers or infrastructures. Within the Copernicus program of the European Union, around 64 million products have been published, which sums up to more than 25 Petabyte of data volume. In the following slider we have collected some statistics about the amounts of EO data from the Sentinel satellites.

Figure: EO data volume and the limits of your computer.

The following interactive exercise assists in estimating the data volume associated with Sentinel-2 data. This calculator allows users to gain insights into the data volumes involved in specific regions and time ranges, further emphasizing the relevance of using EO platforms for scientific analyses. Not convinced of clouds yet? Try the volume calculator below to asses how much space you have to free up on your hard drive for your next project.

How can we handle such volumes of data?#

Cloud infrastructure and platforms have emerged as viable alternatives to the traditional approach of data analyses as described in Figure “EO research without cloud facilities”. These solutions combine data storage and compute resources, enabling users to conduct their data analysis in close proximity to the data itself. By leveraging cloud-based infrastructures, researchers and analysts can optimize their workflow by minimizing the time-consuming steps of data transfer and pre-processing, thereby allowing them to focus more efficiently on data analysis tasks.

By utilizing cloud-based resources, users can harness the scalability and flexibility of these platforms to handle the extensive datasets generated by EO missions. Cloud-based EO platforms represent a paradigm shift in EO data analysis, offering a comprehensive ecosystem that seamlessly integrates storage, processing, analysis tools, collaboration, and visualization. These platforms empower users to overcome the challenges posed by large-scale EO data and accelerate scientific advancements in various fields, including environmental monitoring, climate studies, natural resource management, and disaster response.

Types of platforms#

Cloud-based EO infrastructures and platforms have emerged to meet the growing demand for efficient data processing, analysis, and visualization close to the data. We can distinguish between infrastructure providers and platform providers.

Infrastructure providers#

EO-based infrastructure providers focus on offering the underlying infrastructure necessary for processing, storage, and dissemination of EO data. They provide the computing resources, storage capacity, and networking capabilities required to handle large-scale EO data processing and analysis. These providers often build and maintain data centers and server clusters, ensuring reliable and scalable infrastructure for EO applications.

In comparison to the traditional approach (Figure “EO research without cloud facilities”), this approach allow users to use computing resources close to the EO data (see Figure “EO cloud providers”). Users do not need to download and manage EO data on their own, they can simply use what is already available on the infrastructure. However, there are no EO-specific services for data discovery, access, visualization, and analysis.

Figure: EO cloud providers.

Examples of EO-based infrastructure providers

  1. Amazon Web Services (AWS): AWS offers a wide range of cloud services, including storage (Amazon S3) and computing (Amazon EC2), which can be leveraged for EO data processing and storage. Various open data are available on AWS (https://registry.opendata.aws/).

  2. Google Cloud Platform (GCP): GCP provides infrastructure services like Google Cloud Storage and Google Compute Engine, which can be utilized for EO data management and analysis. Various open data are available on GCP (https://cloud.google.com/datasets).

  3. Microsoft Azure: Azure offers cloud-based services such as Azure Storage, Azure Virtual Machines, and Azure Machine Learning, enabling EO applications and workflows.

  4. Open Telekom Cloud: Open Telekom Cloud is a cloud platform offered by Deutsche Telekom. It provides scalable infrastructure resources, including computing, storage, and networking capabilities, suitable for processing and storing large volumes of EO data.

  5. Cloudferro: Cloudferro is a cloud infrastructure provider specializing in geospatial data processing and analysis. They offer scalable and secure cloud resources optimized for EO applications. Cloudferro provides high-performance computing, storage, and networking services tailored for EO data processing workflows. Various open data are available on Cloudferro (https://cloudferro.com/en/eo-cloud/storage-big-data/).

Platform providers#

Platform providers focus on delivering comprehensive EO platforms that combine infrastructure, tools, and services into a cohesive environment. These platforms typically offer a suite of integrated capabilities, including data storage, processing, analysis, visualization, and collaboration tools. They provide a user-friendly interface and simplify the EO data lifecycle, enabling users to access, process, and analyze EO data without managing the underlying infrastructure.

On top of providing the infrastructure which allows users to do the computations close to the EO data , making available a platform additionally enables the use of specific Application Programming Interfaces (APIs) for the discovery, access, visualization, exploitation, and analysis of EO data. EO platforms are often made available on infrastructure providers to benefit from the EO data storage. Users of a platform can now use harmonized interfaces for all data, which is available on the platform.

Figure: EO platform providers.

Examples of cloud-based EO platform providers

  1. Google Earth Engine is a platform specifically designed for EO data analysis. It provides access to a vast amount of satellite imagery and geospatial datasets, along with powerful processing capabilities and built-in algorithms.

  2. Sinergise Sentinel-Hub is a platform focused on accessing and processing satellite data. It provides APIs and easy-to-use tools for accessing, processing, and visualizing EO data.

  3. Microsoft Planetary Computer is a platform that combines geospatial data and AI capabilities for Earth observation. It provides access to various global datasets, including satellite imagery, climate data, and environmental data. The platform aims to facilitate large-scale data analysis and support sustainable development and conservation efforts.

  4. Euro Data Cube is a platform on top of various cloud infrastructures to provide an interactive development environment with a standardized access to various EO data. It provides a JupyterLab environment for data exploration and analysis, as well as capabilities to run processing jobs.

  5. OpenEO Platform is a platform based on OpenEO, which aims to standardize and simplify the access and processing of EO data. It provides a unified API and common data model, enabling interoperability across multiple EO data providers and processing backends. The platform allows users to run EO workflows on various cloud-based infrastructure providers.

Summary#

In summary, EO-based infrastructure providers primarily focus on providing the underlying infrastructure and resources, while platform providers offer integrated environments with a wide range of tools and services to support EO data processing, analysis, and visualization. These two types of providers complement each other in the EO ecosystem, enabling users to access and leverage EO data effectively.

Components of platforms#

Cloud-based EO platforms have transformed the way researchers and scientists analyze and utilize EO data. These platforms often follow a three-layered design (often named “tiers”) comprising infrastructure, services, and exploitation interfaces. An example architecture based on this approach is the “Earth Observation Exploitation Platform Common Architecture” (EOEPCA) of ESA (https://eoepca.org). Leveraging the power of cloud computing, EO platforms provide a comprehensive ecosystem that seamlessly integrates storage, processing, analysis tools, collaboration, visualization, and data exploitation capabilities.

The following overview will explore each layer of the three-layered design and provide examples to illustrate the functionalities and benefits of cloud-based EO platforms in real-world applications. They showcase the diverse range of tools, services, and interfaces (also named “building blocks”) available to store, process, analyze, collaborate, visualize, and exploit EO data effectively within the cloud environment.

Infrastructure & Resource Tier#

“The Resource Tier represents the hosting infrastructure and provides the EO data, storage and compute upon which the exploitation platform is deployed.” (Source: EOEPCA Master System Design)

  1. Data Storage: The data storage component may include distributed file systems like distributed parallel file systems (e.g., GPFS, Hadoop) or object storage services (e.g., Amazon S3, Google Cloud Storage) to securely store and manage EO datasets.

  2. Computing Resources: The computing component can provide virtual machines (e.g., Amazon EC2, Google Compute Engine, OpenStack), container environments (e.g., Docker-Engine, Kubernetes) or batch-computing systems (e.g., High Performance Computing) for executing data processing and analysis tasks on EO datasets.

Platform Tier#

“The Platform Tier represents the Exploitation Platform and the services it offers to end-users.” (Source: EOEPCA Master System Design)

The services of the platform tier can be grouped into data-related and processing-related services. The processing tools and services often rely on the data services to get discover and get access to data available on the platform.

  1. Data Services

    • Data Catalog: Data available on the platform needs to be described with metadata to be findable by users. Often processing and analysis services, such as Open Data Cube or OpenEO, make use of the data catalog to ease the use of EO data. These services enable users to annotate, search, and discover EO datasets based on various metadata parameters.

    • Data Access Service: This enables users to retrieve and access EO datasets. This may involve APIs, protocols, or data transfer mechanisms like Open Geospatial Consortium (OGC) Web Services or HTTP services for efficient and secure data access.

    • Data Visualization Service: The visualization component provides standardized web services for the visualization of raster and vector data available on the platform. User interfaces like QGIS or web mapping tools can be used together with those services.

  2. Data Processing and Analysis Tools: This service component may include widely used processing tools like GDAL (Geospatial Data Abstraction Library), remote sensing software like SNAP (Sentinel Application Platform), data cube related tools (e.g., xarray & Dask) and APIs (e.g., OpenEO API) for performing advanced analysis on EO data.

Exploitation Tier#

“The Exploitation Tier represents the end-users who exploit the services of the platform to perform analysis, or using high-level applications built-in on top of the platform’s services.” (Source: EOEPCA Master System Design)

  1. User Interfaces: The exploitation interface component may include web-based interfaces like web portals (e.g., EO Browser from Sinergise), dashboards (e.g., Earth Observation Dashboard from NASA, ESA, JAXA or web development environments (e.g., JupyterLab). All of them provide interactive interfaces for users to explore and analyze EO data through a user-friendly interface.

Exercise: Build a platform#

Now it is time for you: Please Drag and drop the building blocks of a platform into a correct diagram.

Quiz#

What types of “layers” or “tiers” are there in a platform architecture?

[[x]] Infrastructure & Resource Tier
[[ ]] Software Tier
[[x]] Platform Tier
[[ ]] Data Cube Tier
[[x]] Exploitation Tier

What does an infrastructure provider offer?

[[x]] Virtual Machines
[[ ]] Data discovery
[[ ]] Data cubes
[[ ]] Data visualization
[[x]] Data storage

What kind of provider is the Euro Data Cube?

[[ ]] Infrastructure provider
[[x]] Platform provider

When should you consider to use an EO cloud platform?

[[x]] I have limited internet bandwidth for data downloading
[[ ]] I have all my data locally on my own servers with lots of computing resources
[[x]] I want to collaborate with other and external users
[[x]] I want to easily make use of processing services
[[x]] I don't want to care about system administration and operations
[[ ]] I do not often have access to internet

Further reading#