GEO4060 Project Assignment 2017
A toolset for processing observational data
Requirements
There are some requirements on tools and packaging:
- For this project use the GitHub code repository https://github.uio.no/annefou/GEO4060_2017
- Create your own branch (use your username for the name of your branch) and regularly commit all your codes. Use your git branch rather than keeping code completely out of the repository.
- Within your branch, create a directory called Project where you will store your project source codes and documentation
- Do not forget to comment your codes...
- The project must be fully automatically buildable, runnable, and testable on a generic linux desktop, using at least the GNU Fortran >= 5.2.0 compiler.
- Make sure you define Fortran 2003 classes for your datatypes
Input data will be provided in two different formats:
Examples of files stored in these two differents formats can be found here. However, the toolset is meant to be used with large files and you should not make an hypothesis on their sizes.
ASCII (text) data format
We will get data in column-based ascii format with a header containing the name of each column. Each row, including the header are separated by blanks or/and tabs (in Fortran use achar(9) to identify the tab character. The number of columns and rows can vary from one file to another and you should not make any assumption when processing an input file.
One ASCII (text) Observation file contains:
- A header containing column names as string.
- The following rows have the same width (number of columns) than the header row but contain values. The type of each column depends on the column name and can be of type integer, real or string (with no more than 8 characters and delimited by single quotes).
expver lat@hdr lon@hdr obsvalue@body
'0001' 12.5 40.6 214.3
'0002' 32.5 30.6 263.101
We assume all ASCII input files have ".txt" as a file extension.
Binary format
The binary format is similar to the ASCII (TEXT) format.
The first line of the binary file contains a header with all the column names.
All the rows, including the header, are written as strings, each column being separated by blanks and/or tabs.
We assume all binary input files have ".bin" as a file extension.
Work to do
Main program
Create a unique main program called "obs.x" with the following options:
- -i: followed by an input filename to process. For instance:
obs.x -i conv_ofb.txt
- -o: followed by an output filename to store the output results. For instance:
obs.x -o conv_ofb.bin
The file extension gives the output format: (".bin" for binary or ".txt" for text/ASCII files).
- -s: select a subset of columns. The list of columns to select is given as a comma separated list of column names given after "-s". For instance:
obs.x -s lat@hdr,lon@hdr,obsvalue@hdr
- -w: select a subset of rows. The condition is given as a column name followed by a relational operator (/=, =,<,>,<=,>=) and a value. The relational operators are as in Fortran (same meaning). On a single command line, more than one condition can be given with several "-w". For instance:
obs.x -w lat@hdr>=50.0 -w lat@hdr<70.0
All these options can be combined. For instance:
obs.x -i conv_ofb.txt -o conv_sel.txt -s lat@hdr,lon@hdr,obsvalue@hdr
obs.x -i conv_ofb.txt -o conv_sel.txt -s lat@hdr,lon@hdr,obsvalue@hdr \
-w obsvalue/=NULL
Tests
Create tests for all the classes and methods defined in your Fortran 2003 module. These tests will be performed when running the command:
obs.x test
This set of tests should return "ALL TESTS PASSED" if successful or "X TESTS FAILED" if X tests failed (X is an integer).
Compression
- Use your program to convert a text/ascii input file into a binary file:
obs.x -i conv_ofb.txt -o conv_ofb.bin
Compare the sizes of these two files. Is it expected? Explain why.
Suggest a more compress binary format
Implement (read/write methods) your new binary format
The final time of delivery
The final time of delivery is Friday 26.05.2017 at 23:59