Overview
Teaching: 10 min
Exercises: 5 minQuestions
How to find your way around UIO galaxy eduPortal?
How to interact with python/pyspark jupyter notebook?
Objectives
To gain familiarity with the various panes in the UIO galaxy eduPortal
To gain familiarity with the buttons and options in the pySpark jupyter notebook
To be able to manage your Galaxy history
You have been using python for analyzing your data but considering the growth in volume and complexity you are now willing to make a further step. This lesson will teach you how to start using PySpark and introduce you to the map-reduce programming model.
To ease our work and avoid installing Spark on your laptop, we will be using the UIO Galaxy eduPortal. If you haven’t received a login and password yet, don’t panic. This can be handled in few minutes during the workshop.
Remark: without changing your pySpark code, you can scale up to hundred processors on UIO HPC abel… For more information see
We’ll be using UIO Galaxy eduPortal: Galaxy is an open source, web-based platform for data intensive that has been initially developed for biomedical research.
Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make data intensive accessible to research scientists that do not have extensive computer programming experience. To learn more, take one of our Galaxy tours.
** For most of the images below, you can click to view a short video or get detailed documentation on the corresponding subject.**
Login panel
Basic layout
Start a pyspark jupyter notebook
A Jupyter notebook can be started either existing dataset in your History or you can use our jupyter notebook template for python 3.
Then you should get:
Key Points
Use UIO Galaxy eduPortal to start a pySpark jupyter notebook
Start a pySpark jupyter notebook from the UIO Galaxy eduPortal.