Large vector data processing: Difference between revisions

From GRASS-Wiki
Jump to navigation Jump to search
(relevant tickets)
 
Line 24: Line 24:


* [[GRASS GIS Performance]]
* [[GRASS GIS Performance]]
* [[Large raster data processing]]


[[Category: massive data analysis]]
[[Category: massive data analysis]]
[[Category: vector]]
[[Category: vector]]

Latest revision as of 14:54, 8 May 2017

Option to reduce memory consumption for vector topology

From M.Metz's email.

The topological vector format of GRASS can require quite a bit of memory for larger vectors, which can cause problems such as out-of-memory errors or freezing the machine if a vector module uses up all system memory and goes into swap space.

The largest component in GRASS vector topology is the spatial index which can exceed available system memory with larger vectors, e.g. LiDAR point clouds with hundreds of millions of points, or very large and detailed land cover/land use polygons. In GRASS 7, I have added a new option to use a file-based version of the spatial index which can be activated by setting the new shell environment variable GRASS_VECTOR_LOWMEM, e.g. in bash

export GRASS_VECTOR_LOWMEM=1

and deactivated with

unset GRASS_VECTOR_LOWMEM

As long as the spatial index fits into system memory, e.g. v.build is as expected a bit slower with the file-based version than with the memory-based version and takes about 1.5 times as long for larger vectors (> 100,000 areas). For smaller vectors (e.g. < 50,000 areas), there is not much difference. If the spatial index does not fit into system memory, the memory-based version would at some stage abort with an out-of-memory error or go into swap space, whereas the file-based version will very likely complete successfully and not freeze the machine.

Of the vector cleaning tools, breaking polygons (used when importing polygons, the same function as called by v.clean tool=bpol) has the highest memory consumption because all unique vertices of all boundaries are loaded to a search index. I have added a file-based version for breaking polygons in GRASS 7 which is activated as described above. When importing a test vector with 258,000 polygons, memory consumption dropped from 3.9 GB to 90 MB, that is by 98%, while breaking polygons. LFS is recommended for the new file-based version.

Relevant tickets

See also