Vector data cubes as a bridge between raster and vector worlds
- Duration:
- 30 minutes
Abstract
This talk introduces the concept of vector data cubes - multi-dimensional arrays where at least one dimension is composed of vector geometries - and its implementation in Python within a new library Xvec, built on top of Xarray, Shapely 2.0 and GeoPandas.
Description
Vector geospatial data, such as points, lines and polygons, are primarily stored in one-dimensional arrays; in Python, typically as a GeoPandas' GeoSeries column within a data frame. At the same time, raster data representing, among others, satellite imagery or digital terrain models (DTM) are multi-dimensional arrays with a number of dimensions being anywhere from 2 (e.g. DTM) to N (e.g. multi-spectral time series from Sentinel 2), in Python mostly captured as Xarray objects. The two, raster and vector, rarely meet, and if so, we either transform one to the other using rasterisation or vectorisation or link vector geometries to a raster index via a unique identifier. However, there is a space in between that would benefit from multi-dimensional arrays with support for vector geometry.
This talk presents a concept of vector data cubes - multi-dimensional arrays where at least one dimension is composed of vector geometries - and its implementation in Python using Xarray, Shapely 2.0 and GeoPandas under the hood. Using the new Xvec package extending Xarray's abilities, we can assign an array of vector geometries as dimension coordinates and use the geospatial capabilities within the Xarray object. A typical use case can be a time series of a land cover proportion per region, where one dimension is composed of vector region boundaries, other as land use classes and a third one capturing time. While the same can be stored as a long-form data frame, it is not an efficient format due to a large amount of data replication. Using the vector data cube instead brings both efficiency and convenience as the included vector dimension can be directly used for indexing, plotting, and other operations that depend on the vector data and for which we would need to switch between Xarray and GeoPandas.