GEF4530 - Practicals D1 - Computing and Visualization infrastructure




An e-infrastructure for Science


E-science is the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, preprints, and print and/or electronic publications.



The e-infrastructure for Science in Norway follows the same structure and provides users with both computing resources (Notur) and post-processing and visualization facilities with large storage capacity (NorStore).

The picture below introduces the data life cycle from the generation of your model outputs on Notur computing facility (Abel) to the preservation of your model results in the NorStore archive.


Notur


What is Notur?


Through the Notur-project, UNINETT Sigma2 serves the Norwegian computational science community by providing the infrastructure to individuals or groups involved in education and research at Norwegian universities and colleges, and research and engineering at research institutes and industry who contribute to the funding of Notur.
The HPC-service provides the customer access to facilities and software with a far greater capacity than is normally available at department and faculty levels. The service is primarily set up to run simulation calculations for research and educations purposes, designed as batches.

For running CESM CAM-5.3, such computing facilities are necessary.

Abel computing facility


Abel HPC
Abel is a cluster machine:

Abel is a large machine (cluster) made of more than 10000 nodes (collection of machines linked together via an efficient network). On one single node, there are 16 cores and a total of 64 GB of shared memory. This shared memory can be accessed by all the processors of one single node but a processor on another node cannot access it. The image below attempts to summarize these two concepts (shared vs. distributed memory):

To manage efficiently the machine, it runs under the control of a batch system. The fact is that one single program does not usually use the full machine (1392 CPUs) but many users can fill the machine very quickly with several "small" programs.
The opposite of a batch job is interactive processing, in which a user enters individual commands to be processed immediately. This is what you are used to when working on your laptop or any UIO servers (such as sverdrup.uio.no).
We need to use a batch system to make sure all the resources are well utilized and this is the role of the job scheduler to decide where to run user "jobs". Its role is to optimize the resources and to try to run as many user jobs as possible. It can be seen as a tetris game (see image below) where each block represents a user job.

All user jobs must be submitted to the cluster through this batch system. Abel uses SLURM (Portable Batch System). The submitted jobs are then routed into a number of queues (depending on the needed resources, e.g. runtime) and sorted according to some priority scheme.
A job will run when the required resources become available.
More information on the Batch system on Abel can be found here.

Available Filesystems on Abel


The following file systems exist on abel: Note: the /work/users/* directories are subject to automatic deletion dependent on modification, access time and the total usage in the file system. The oldest files will be deleted first.

NorStore


What is NorStore?


NorStore is the Norwegian infrastructure for storing scientific data.

NorStore facility is divided in two parts:

When running the CAM-5.3 model on abel, the model outputs are generated and stored in the temporary working area (/work/users/$LOGNAME). As mentioned earlier, the working area on Abel is a temporary storage area and data must be moved to a more permanent storage area where you will be able to easily post-process and visualize your model results.
Model outputs will have to be moved from Abel working area (/work/users/$LOGNAME) to the norStore project area. You can use scp to copy your data from Abel to NorStore but the detailed procedure will be explained later.

Cruncher


Once your model outputs are moved to norStore, you can start post-processing and generating plots. The machine you will be using for post-processing and visualizing your data is called cruncher.norstore.uio.no.
The main advantage of using this machine is that your data are directly accessible from cruncher and the necessary post-processing and visualization packages we need are already available (see here for a more complete list).

Available Filesystems on Cruncher



The following file systems exist on cruncher: