The Open Science Journey - Open Science in geospatial, EO and EO cloud platforms#

Learning Objectives#

  • Understand what Open Science is

  • Get to know the FAIR concept

  • Follow the steps of creating Open Science

  • Understand the role of Open Science in geospatial, EO and EO cloud platforms

The Open Science Journey#

Finally let’s see how open science principles are applied in the field of geospatial, earth observation and EO cloud platforms. To begin we will have a look at the open science journey and a research project that has adapted openness and the FAIR principles very well. Then we will have a look at the role open science plays in today’s geospatial and EO world.

This drag and drop game asks you to connect the tasks to their respective step within the open science journey. If you hover over the icons, their description will pop up.

Open Science in the ClirSnow Project#

The ClirSnow Project is a great example of how the concepts of opennes and FAIR are applied to a real world research project.

The Open Science Journey

Video content in collaboration with Michael Matiu (University of Trento).
“It seems like a lot of work the first time you do it. And it is. But once you know how to do it, you will use it in every research project, because it actually makes research so much easier. And, it will boost your research impact and credibility. It is really worth it.”

The Role of Open Source Software in Geospatial - The example of GDAL#

Open Science plays an important role in geospatial. Open source software is a part of that and the Geographic Data Abstraction Library (GDAL) software is a great example of how important open source software is in the geospatial world. Paul Ramsey, the co-founder of the PostGIS extension, has described what GDAL is in a metaphoric way in a mapscaping.com podcast: “GDAL is data plumbing, a bit like an international electrical plug set for traveling — it’s got multiple different shaped plugs. Electricity is “just” electrons moving around. But they can move around as DC, AC, 120 volts or 240 volts. Plus, there are all these different ways you can plug and join electrical things. At the core, electricity is electrons vibrating, but it can be quite complex to get your hair dryer spinning.” Howard Butler, a director of the Open-Source Geospatial Foundation, said about the importance of GDAL: “[…] It’s open, it provides core functionality, I can’t understand how anybody gets anything done without it.“

The Role of Open Source Software in Geospatial - The example of GDAL

Video content in collaboration with Even Rouault (Main Developer of GDAL).

Open Science in EO Cloud Platforms#

  • Code: Workflows and Code can easily be shared on EO Cloud Platforms. There are openly available tutorial notebooks. Workflows can be shared as user defined processes and be reused by the community. There are user forums that share solutions and snippets. OpenEO, a standardized processing API for EO in the cloud, allows code to be portable between different cloud platforms. This increases reprodicibility, collaboration and prevents vendor locks.

    • To Do: Image Slider: openEO Platform Forum, Tutorial Notebooks Microsoft Planetary Computer, User Defined Processes openEO,

  • Results: There are multiple ways to share results created in EO cloud platforms. Ideally they can be ingested into the platform and be made available as collections for other users directly upon creation. If the result comes with appropiate metadata (e.g. according to the STAC specification) they can easily be registered in publicly avialable STAC Catalogues. Cloud Native Data Formats (described in more detail in lesson 2.4 Formats and Performance), like cloud optimized geotiff, are accessible via https requests. So instead of sharing a file, only a URL pointing to the file is shared.

    • To Do: Image Slider: Collection in a Platform, STAC Catalogue, Link to a COG

  • Publication: If a publication is built on top of results produced in an EO cloud platform, the results and code can easily be linked to the publication in one of the forms described aboved. For example, you can publish your openEO process graph and link to it, and provide a link to a STAC Catalogue where the results are accessible.

    • To Do: Example of a Publication where the code is available on a cloud platform

  • FAIRness:

    • Findable: Data is usually presented through a data catalogue (e.g. STAC Catalogues are used in openEO platform and the Microsoft Planetary Computer) that is explicitly made for searching data. In many cases searching data works even without registration on the platform.

    • Accessible: Data access in cloud platforms is usually granted after registration and authentication. Since cloud computing resources can easily be misused a certain degree of access control is necessary.

    • Interoperable: Processing standards like openEO aim at making the code interoperable, which means it is transferable between platforms. Standardised metadata attached to the results,the use of cloud optimized formats and reingestion of the results into the platform guarantee easy uptake of the results right away. Different sources of satellite data are made interoperable by the cloud platform through the use of data cubes and processing on the fly - reprojections, regridding and temporal alignment are enabled on the fly.

    • Reusable: To make results reusable for others, they need to be accessible and have an open license. Ideally, a license of choice can be added to the metadata and the results are reingested into the platform as a public collection, available for everyone.

  • Analysis Ready Data (ARD): Analysis Ready Data are in the context of EO cloud platforms are usually satellite data that have been processed to a minimum set of requirements and organized into a form that allows immediate analysis with a minimum of additional user effort and interoperability both through time and with other datasets. This means for example that atmospheric correction and cloud masking has already been applied to optical data. Many collections on cloud platforms are analysis ready, so that users can directly start the analysis withouth the tedious and technically demanding preprocessing steps. Since ‘analysis ready’ can mean different things to different people, CEOS is working on standardizing what analysis ready data are.

    • To Do: Image of CEOS ARD

References#

Exam#

Search a license from the Creative Commons License Chooser that adheres to the following: proper attribution/citation must be included when used, free for commercial use and Adaptions of the work can be shared, but only under the same or a compatible license.

[( )] CC BY 4.0
[(X)] CC BY-SA 4.0
[( )] CC BY-SA-ND 4.0

Was the license you have just chosen a software license or a data license?

[( )] Software License
[(X)] Data License

Find the Open Research Data Set “Snow cover in the European Alps: Station observations of snow depth and depth of snowfall” on the catalogue OpenAIRE.

Share the DOI link to the data set version v1.3 in the format https://doi.org/10.5281/zenodo.XXXXXXX. Hint the last numbers are 74

[[https://doi.org/10.5281/zenodo.5109574]]

On which repository is the data set registered?

[( )] Integrated Ocean Observing System (https://ioos.noaa.gov)
[( )] PANGAEA (https://pangaea.de)
[(X)] ZENODO (https://zenodo.org)

Which license is used for the data set? Copy the URL to the license here.

[(X)] Creative Commons Attribution 4.0 International
[( )] Creative Commons Attribution-NonCommercial 4.0 International
[( )] Creative Commons Attribution-ShareAlike 4.0 International

Find the open access publication that is connected to the dataset. The one that has been published in “The Cryosphere”. Copy the DOI of the article here in the format https://doi.org/XX.XXXX/tc-XX-XXXX-XXXX. Hint: The DOI ends with 21

[[https://doi.org/10.5194/tc-15-1343-2021]]

Under which license is this course published. You can find this out on the courses GitHub page.

[( )] Massachusets Institute of Technology License (MIT License)
[( )] Creative Commons Attribution-ShareAlike 4.0 International License 
[(X)] Creative Commons Attribution 4.0 International License 

How is data FAIR in a cloud platform? Connect the subjects to the FAIR keywords.

[[Findable] [Accessible] [Interoperable] [Reusable]]
[(X)        ( )          ( )             ( )       ]  STAC Metadata, Metadata Catalogue
[( )        ( )          (X)             ( )       ]  Usage of abstract data cubes instead of different file formats
[( )        (X)          ( )             ( )       ]  Authentication, Login, Free Trial Accounts
[( )        ( )          ( )             (X)       ]  Data licenses attached to collections, provenance of the data is reported