- Principles :
Data files are transferred to the data repository site with two goals :
- Maintenance of a unique collection of dataset
- Dissemination to colleague scientists.
This implies that the files have to be portable, simple to read, provide enough description of the data and have to be referenced in a standard metadata catalog.
- Metadata catalog :
At this time, CarboEurope regional Experiment DataSets are filled with the FGDC standard but when CarboEurope-IP will be fixed the official standard, metada will be reformatted .
Here a minimal form is proposed for insert or modify each DataSet' Metadata .
- Data description :
The minimum information to provide is the localisation in space and time of each measure (UTC time, Lat, Lon, Altitude) and its unit.
Since the height above ground is sometimes a more important information than altitude (height above sea level), the vertical positioning will be divided into 2 fields :
- Altitude (AMSL) of the ground at the measuring point.
- Height above ground of the measurement
Mobile measuring platforms with no means to get exact values of the heights defined above can use the pressure as a Z Coordinate.
A "Delta" will be associated to each information, which represents the volume/time over which a measure is integrated. Delta are 0 for instantaneous and in-situ measurements.
Conventions will be different for space and time :
- For space, the indication (Lat/lon/height] is the center of the explored volume, and delta is the width of this volume.
- For time, the indication is the END of the measuring period, and delta is its duration.
For «emissions» files, the indication is the BEGINNING of the mesuring period : for example, the «01» hour file contains sum of emissions between 01:00 and 02:00 UT, etc.
For the value, negatives delta are used as a code for :
| Delta |
Value |
Comment |
| -1 | 0 | trace |
| -2 | v | mesure ≤ v |
| -4 | v | mesure < v |
| -8 | v | mesure ≥ v |
| -16 | v | mesure > v |
A characterisation of the instrument is also necessary :
- Sensor type (or measuring principle)
- Sensor identification (manufacturer, model, serial num.)
- Processing method reference
If samples are grouped by acquisition set (a balloon launch, an aircraft flight), this has to be mentioned, and can be complemented by a comment or a description of the measuring period.
- Units and representations :
The chosen principle is to provide data in a conservative units (i.e. humidity expressed in mixing ratio rather than relative humidity), whenever an exact transformation is possible.
When it is not possible (for example LIDARS giving concentrations profiles expressed in mol/volume which would need a thermodynamical profile for convertion to mixing ratio), the data are provided in their exact orignal unit.
Wind is preferably expressed in speed and direction (meteorological conventions) rather than vector components.
Latitudes and longitudes are expressed in decimal degrees (rather than degrees, minutes and seconds).
- File format :
We chose NOT to impose a specific file format, thinking that it would need more work for scientists coming from various fields to fit their data in one single format rather having I.T. people reading various formats.
Here are the main recommendations for choosing your format :
- If a standard format exists in your field, and is commmonly used for data transfers, and reading programs are available, then you can use it. For example, the ISO format for operational regional air quality networks.
- You can also re-use a format defined for a former campaign, if you know that most instrument operators in your field already know how to write/read such a format. For example, wind profiler radars will use a enhanced version of the MAP format
If you have to choose your own format, here are the recommendations :
- A coordination of similar instruments or platforms is recommended. For example, within the aircraft group, a common parameter list was defined.
If you have no naming convention for your dataset you are encouraged to use
the GCMD Thesauri
- The only format which can be read and written from multiple software or applications (databases, graphers, spreadsheet) is text files (ie ASCII), with common delimiter (comma, semi-column, ..). Avoid proprietary format (such as excel, which may not be read on another version or OS), and be aware that sending output-type formats (such as postscript, PDF, jpeg) is not of much greater use than a paper output.
For text format of real values, the use of a dot as decimal point is prefered over a coma (i.e. "23.2" instead of "23,2").
- Compression :
The compression format, if used, should be common enough on MS-windows and Unix machines.
Currently, suitable formats are : zip, tar and GNU zip (gz).