Data model

We enforce the usage of xarray.Dataset and xarray.DataArray with specific naming conventions defined in the OPENSENSE OS data format conventions. This mainly concerns the naming of the variables that hold the coordinates. By enforcing this naming convention we simplify all functionality regarding spatial calculation and plotting.

Note that only longitude and latitude coordinates (in decimal degrees) are required to meet the OPENSENSE OS data format conventions. Additionally we require x and y (and site_0_x, etc. for line data), which should be projected coordinates, to be present in a xarray.Dataset for certain function that rely on distance calculations (e.g. for finding neighbors) because distance calculations are not correct when done with lon-lat in degrees. We provide a simple function to do the projection, but the user is free to choose which projection to use. Since we use the projected coordinates for distance calculation, it should preserve distances as good as possible in the region of interest.

Point data

This is an example of a dataset for point data with the required variables and their required names.

<xarray.Dataset>
Dimensions:    (time: 219168, id: 134)
Coordinates:
  * time       (time) datetime64[ns] 2016-05-01T00:05:00 ...
  * id         (id) <U6 'ams1' 'ams2' 'ams3' ...              # Station ID
    lat        (id) float64 52.31 52.3 52.31 ...              # Latitude in decimal degrees
    lon        (id) float64 4.671 4.675 4.677 4.678 ...       # Longitude in decimal degrees
    x          (id) float64 2.049e+05 2.052e+05 ...           # Projected x coordinates
    y          (id) float64 5.804e+06 5.803e+06 ...           # Projected y coordinates

Line data

When working with CML data we assume the following data structure:

<xarray.Dataset> Size: 128kB
Dimensions:       (cml_id: 359, time: 31)
Coordinates: (12/15)
    sublink_id    <U9 36B ...
  * cml_id        (cml_id) int64 3kB 10001 10002 10003 ... 10362 10363 10364
    site_0_lat    (cml_id) float64 3kB 57.7 57.73 57.69 ... 57.65 57.66 57.71
    site_0_lon    (cml_id) float64 3kB 12.0 11.98 11.97 ... 12.12 12.03 12.01
    site_1_lat    (cml_id) float64 3kB 57.7 57.72 57.69 ... 57.66 57.63 57.71
    site_1_lon    (cml_id) float64 3kB 11.99 11.97 11.98 ... 12.14 11.97 11.98
  * time          (time) datetime64[ns] 248B 2015-07-25T12:30:00 ... 2015-07-...
    site_0_x      (cml_id) float64 3kB 6.785e+05 6.776e+05 ... 6.792e+05
    site_0_y      (cml_id) float64 3kB 6.4e+06 6.402e+06 ... 6.394e+06 6.4e+06
    site_1_x      (cml_id) float64 3kB 6.783e+05 6.77e+05 ... 6.778e+05
    site_1_y      (cml_id) float64 3kB 6.399e+06 6.402e+06 ... 6.401e+06
Data variables:
    R             (time, cml_id) float64 89kB ...

Here, site_0_x and site_0_y are projected coordinates of site_0_lon and site_0_lat. Typically only the lon-lat coordinates are given. But we rely on the projected coordinates for distance calculations.

For SML data only the ground site has a lon-lat coordinate pair. With the info on the longitude of the geostationary satellite it is point to, we can calculate elevation and azimuth. With a given melting layer height, e.g. taken from atmospheric model output, we can then derive the path that is passing through rain, where most of the path attenuation is caused. For a given melting layer height we can then assign a virtual site_1_x and site_0_y which use for plotting, see PR71.

Gridded data

For gridded data, mostly weather radar data in our applications, we assume the following data structure:

<xarray.Dataset> Size: 484kB
Dimensions:          (time: 31, x: 37, y: 48)
Coordinates:
  * time             (time) datetime64[ns] 248B 2015-07-25T12:30:00 ... 2015-...
  * x                (x) float64 296B -1.542e+05 -1.522e+05 ... -8.22e+04
  * y                (y) float64 384B -3.413e+06 -3.415e+06 ... -3.507e+06
    lat              (y, x) float32 7kB 57.21 57.21 57.21 ... 58.06 58.06 58.06
    lon              (y, x) float32 7kB 11.41 11.45 11.48 ... 12.59 12.62 12.66
    x_grid           (y, x) float64 14kB 6.457e+05 6.478e+05 ... 7.157e+05
    y_grid           (y, x) float64 14kB 6.343e+06 6.343e+06 ... 6.441e+06
Data variables:
    rainfall_amount  (time, y, x) float64 440kB 0.01078 0.0 ... 0.121 0.05403

Here, x_grid and y_grid are the projected coordinates with the same 2D shape as lon and lat. Most often only lon and lat are provided. Note that x and y are only 1D arrays. They might define an equidistant 2D xy-grid but that is not a requirement in our data model. We rely on x_grid and y_grid when doing distance calculations.

Note that there are different ways to define the location of grid cells, e.g. the coordinates can define the grid centroid or the lower left corner. We take that into account in the grid intersection code in GridAtLines. But this is not yet taken into account in GridAtPoints.