Large vector data processing
Option to reduce memory consumption for vector topology
From M.Metz's email.
The topological vector format of GRASS can require quite a bit of memory for larger vectors, which can cause problems such as out-of-memory errors or freezing the machine if a vector module uses up all system memory and goes into swap space.
The largest component in GRASS vector topology is the spatial index which can exceed available system memory with larger vectors, e.g. LiDAR point clouds with hundreds of millions of points, or very large and detailed land cover/land use polygons. In GRASS 7, I have added a new option to use a file-based version of the spatial index which can be activated by setting the new shell environment variable GRASS_VECTOR_LOWMEM, e.g. in bash
export GRASS_VECTOR_LOWMEM=1
and deactivated with
unset GRASS_VECTOR_LOWMEM
As long as the spatial index fits into system memory, e.g. v.build is as expected a bit slower with the file-based version than with the memory-based version and takes about 1.5 times as long for larger vectors (> 100,000 areas). For smaller vectors (e.g. < 50,000 areas), there is not much difference. If the spatial index does not fit into system memory, the memory-based version would at some stage abort with an out-of-memory error or go into swap space, whereas the file-based version will very likely complete successfully and not freeze the machine.
Of the vector cleaning tools, breaking polygons (used when importing polygons, the same function as called by v.clean tool=bpol) has the highest memory consumption because all unique vertices of all boundaries are loaded to a search index. I have added a file-based version for breaking polygons in GRASS 7 which is activated as described above. When importing a test vector with 258,000 polygons, memory consumption dropped from 3.9 GB to 90 MB, that is by 98%, while breaking polygons. LFS is recommended for the new file-based version.