Vector topology cleaning: Difference between revisions
(+Hints; page structured) |
(→See also: +Fixing centroids outside area) |
||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Introduction == | |||
'''Q:''' What would be the 'recommended' way of dealing with the 'errors' found in vector data which were created with zero topological restrictions (and saved as .shp?) | |||
'''A:''' For import, try to find a '''snapping threshold''' for {{cmd|v.in.ogr}} that produces an error-free output (during import the module suggests a threshold). Ideally the output would not only be error-free, but the number of centroids would match the number of input polygons (both are reported by ''v.in.ogr''). The ''min_area'' option of ''v.in.ogr'' could also help. The bulk of the cleaning should be done by | |||
''v.in.ogr''. After that, removing small areas with <tt>v.clean ... tool=rmarea</tt>, threshold in square meters, could help. | |||
Example: For data which are mainly based on Landsat data, a threshold of 10 square meters could remove artefacts and preserve valid areas (Landsat pixel size is about 90 square meters). The threshold needs to be empirically determined. | |||
Note: there might not be a standard procedure that works for all data sources. | |||
=== A strategy to find a viable '''snapping threshold'''=== | |||
A hint for snapping: {{cmd|v.in.ogr}} suggest for vector data with topological errors a range of <nowiki>[1e-08, 1]</nowiki> for suitable snapping values. The exponent thus ranges from -8 to 0. Testing all possible values in this range obviously takes a lot of time. | |||
''Strategy:'' You could set low = -8 , high = 0, and set mid to (low + high) / 2 = -4 and test {{cmd|v.in.ogr}} with snap=1e$mid. | |||
* If you still get errors, increase: set low to mid, get new mid with (low + high) / 2 | |||
* else, decrease: set high to mid, get new mid with (low + high) / 2 | |||
Continue this until you found the threshold were these warnings just disappeared. | |||
Note: Snapping is slow and uses quite a bit of memory because it needs a spatial search tree. | |||
== Cleaning large network datasets == | == Cleaning large network datasets == | ||
'''Q:''' How can I speed up topologial cleaning ( | '''Q:''' How can I speed up topologial cleaning (v.clean) in GRASS 6 for large network datasets (for example OpenStreetMap data)? | ||
'''A:''' The improved {{cmd|v.clean | '''A:''' The improved {{cmd|v.clean}} version in GRASS 7 is way faster. Here some hints though: | ||
''GRASS 6:'' When '''breaking lines''' it is recommended to | ''GRASS 6:'' When '''breaking lines''' it is recommended to | ||
Line 10: | Line 34: | ||
* use {{cmd|v.build.polylines}} to merge lines again. | * use {{cmd|v.build.polylines}} to merge lines again. | ||
''GRASS 7:'' Here this has become much easier. Use {{cmd|v.clean | ''GRASS 7:'' Here this has become much easier. Use {{cmd|v.clean}} with the -c flag and 'tool=break' and 'type=line'. The 'rmdupl' tool is then ''automatically'' added, and the splitting and merging is done internally. | ||
== Cleaning zero length lines == | |||
In order to remove all lines with zero length, run | |||
v.clean type=line tool=rmline | |||
== Cleaning patched polygons == | == Cleaning patched polygons == | ||
Line 16: | Line 45: | ||
'''Q:''' How can I patch to fitting area maps with have been digitized separately and correct the topology? I observe that the shared polygon boundaries do not perfectly match... I need to clean topology. | '''Q:''' How can I patch to fitting area maps with have been digitized separately and correct the topology? I observe that the shared polygon boundaries do not perfectly match... I need to clean topology. | ||
[[Image:Polygon map topology problems.png|center|thumb|500px|Polygon vector map with topology problems]] | [[Image:Polygon map topology problems.png|center|thumb|500px|Polygon vector map with topology problems (click to see)]] | ||
'''A:''' You can use {{cmd|v.clean}} for this. | '''A:''' You can use {{cmd|v.clean}} for this. | ||
Line 36: | Line 65: | ||
=== Hints === | === Hints === | ||
* In recent GRASS GIS versions, snapping thresholds for unclean polygons are suggested to the user when using v.in.ogr | * In recent GRASS GIS 7 versions, '''snapping thresholds''' for unclean polygons are '''suggested''' to the user when using {{cmd|v.in.ogr}}. | ||
* If the input polygons are supposed to not overlap each other, the number of centroids should be identical to the number of input polygons. If not the case, more topological cleaning is needed. | ** Note that the '''snap threshold''' is given '''in map units''', e.g. meters. To select a reasonable value consider the distance of vertices in the map, i.e. the level of detail in order to avoid that either the entire geometry is getting "ruined" by an aggressive snapping nor that nothing happens if the snap threshold is magnitudes lower than the map unit (e.g. trying to fix cadastral maps at pico-meter level will not change much). | ||
* If the input polygons are '''supposed to not overlap''' each other, the number of centroids should be identical to the number of input polygons. If not the case, more topological cleaning is needed. | |||
* If the '''input polygons have logical errors''', for example when the same landuse polygon is present more than once, this can not be cleaned automatically with {{cmd|v.in.ogr}} or {{cmd|v.clean}}. You can investigate overlapping areas in the imported vector with '{{cmd|d.vect}} yourmap type=area layer=2' (only overlapping areas have a category in layer 2 after import). Additionally you may show the centroids of layer=2 to easier find tiny overlapping areas with '{{cmd|d.vect}} yourmap type=centroid layer=2' | |||
'''Q:''' How about self-intersecting lines and boundaries? | '''Q:''' How about self-intersecting lines and boundaries? | ||
'''A:''' In the GRASS topological model self-intersecting lines are allowed, self-intersecting boundaries are not. Self-intersecting lines are ok e.g. for {{cmd|v.net}} modules, e.g. to represent a bridge of a secondary road over a highway. | '''A:''' In the GRASS GIS topological model self-intersecting lines are allowed, self-intersecting boundaries are not. Self-intersecting lines are ok e.g. for {{cmd|v.net}} modules, e.g. to represent a bridge of a secondary road over a highway. | ||
Note: There are some modules that do not like self-intersecting lines, e.g with {{cmd|v.buffer}} problems are expected. | Note: There are some modules that do not like self-intersecting lines, e.g with {{cmd|v.buffer}} problems are expected. | ||
'''Q:''' I've imported a shapefile with 6842 input polygons and after importing (with 1e-12 snapping threshold) there are 6800 centroids. Further cleaning does not change the topology. Why? | |||
'''A:''' It could be that some of the input polygons are exact duplicates. v.clean can remove them. | |||
'''Q:''' Can I ignore the areas without centroids? | |||
'''A:''' Yes, these are typically holes in polygons (islands). | |||
== Fixing centroids outside area == | |||
Q: {{cmd|v.clean}} reports: "WARNING: Number of centroids outside area: x" | |||
A: ... todo (v.build with error map to get category of offending centroid; then v.edit to delete it) | |||
== See also == | == See also == | ||
* Intro to [[Vector Database Management|vector data model]] (with drawing) | |||
* [[Vector topology]] | * [[Vector topology]] | ||
* [[Vector Overlapping Areas]] | |||
* [http://grass.osgeo.org/programming7/vlibTopology.html#vlibTopoExamples Topology examples] (GRASS Programmer's Manual) | * [http://grass.osgeo.org/programming7/vlibTopology.html#vlibTopoExamples Topology examples] (GRASS Programmer's Manual) | ||
Latest revision as of 13:02, 5 April 2020
Introduction
Q: What would be the 'recommended' way of dealing with the 'errors' found in vector data which were created with zero topological restrictions (and saved as .shp?)
A: For import, try to find a snapping threshold for v.in.ogr that produces an error-free output (during import the module suggests a threshold). Ideally the output would not only be error-free, but the number of centroids would match the number of input polygons (both are reported by v.in.ogr). The min_area option of v.in.ogr could also help. The bulk of the cleaning should be done by v.in.ogr. After that, removing small areas with v.clean ... tool=rmarea, threshold in square meters, could help.
Example: For data which are mainly based on Landsat data, a threshold of 10 square meters could remove artefacts and preserve valid areas (Landsat pixel size is about 90 square meters). The threshold needs to be empirically determined.
Note: there might not be a standard procedure that works for all data sources.
A strategy to find a viable snapping threshold
A hint for snapping: v.in.ogr suggest for vector data with topological errors a range of [1e-08, 1] for suitable snapping values. The exponent thus ranges from -8 to 0. Testing all possible values in this range obviously takes a lot of time.
Strategy: You could set low = -8 , high = 0, and set mid to (low + high) / 2 = -4 and test v.in.ogr with snap=1e$mid.
- If you still get errors, increase: set low to mid, get new mid with (low + high) / 2
- else, decrease: set high to mid, get new mid with (low + high) / 2
Continue this until you found the threshold were these warnings just disappeared.
Note: Snapping is slow and uses quite a bit of memory because it needs a spatial search tree.
Cleaning large network datasets
Q: How can I speed up topologial cleaning (v.clean) in GRASS 6 for large network datasets (for example OpenStreetMap data)?
A: The improved v.clean version in GRASS 7 is way faster. Here some hints though:
GRASS 6: When breaking lines it is recommended to
- split the lines first in smaller segments with v.split using the vertices option. Then,
- run v.clean with 'tool=break'. After that,
- use v.build.polylines to merge lines again.
GRASS 7: Here this has become much easier. Use v.clean with the -c flag and 'tool=break' and 'type=line'. The 'rmdupl' tool is then automatically added, and the splitting and merging is done internally.
Cleaning zero length lines
In order to remove all lines with zero length, run
v.clean type=line tool=rmline
Cleaning patched polygons
Q: How can I patch to fitting area maps with have been digitized separately and correct the topology? I observe that the shared polygon boundaries do not perfectly match... I need to clean topology.
A: You can use v.clean for this.
Tools to consider:
- snap,bpol,rmdupl,break,rmdupl,rms
- the threshold (in map units) should be very small
Example (Lat-Long):
v.in.ogr natural_earth/ne_110m_admin_0_countries.shp out=country_boundaries snap=0.0001 TODO: FIX THIS
Polygon import from SHAPE file
v.clean applied:
Hints
- In recent GRASS GIS 7 versions, snapping thresholds for unclean polygons are suggested to the user when using v.in.ogr.
- Note that the snap threshold is given in map units, e.g. meters. To select a reasonable value consider the distance of vertices in the map, i.e. the level of detail in order to avoid that either the entire geometry is getting "ruined" by an aggressive snapping nor that nothing happens if the snap threshold is magnitudes lower than the map unit (e.g. trying to fix cadastral maps at pico-meter level will not change much).
- If the input polygons are supposed to not overlap each other, the number of centroids should be identical to the number of input polygons. If not the case, more topological cleaning is needed.
- If the input polygons have logical errors, for example when the same landuse polygon is present more than once, this can not be cleaned automatically with v.in.ogr or v.clean. You can investigate overlapping areas in the imported vector with 'd.vect yourmap type=area layer=2' (only overlapping areas have a category in layer 2 after import). Additionally you may show the centroids of layer=2 to easier find tiny overlapping areas with 'd.vect yourmap type=centroid layer=2'
Q: How about self-intersecting lines and boundaries?
A: In the GRASS GIS topological model self-intersecting lines are allowed, self-intersecting boundaries are not. Self-intersecting lines are ok e.g. for v.net modules, e.g. to represent a bridge of a secondary road over a highway.
Note: There are some modules that do not like self-intersecting lines, e.g with v.buffer problems are expected.
Q: I've imported a shapefile with 6842 input polygons and after importing (with 1e-12 snapping threshold) there are 6800 centroids. Further cleaning does not change the topology. Why?
A: It could be that some of the input polygons are exact duplicates. v.clean can remove them.
Q: Can I ignore the areas without centroids?
A: Yes, these are typically holes in polygons (islands).
Fixing centroids outside area
Q: v.clean reports: "WARNING: Number of centroids outside area: x"
A: ... todo (v.build with error map to get category of offending centroid; then v.edit to delete it)
See also
- Intro to vector data model (with drawing)
- Vector topology
- Vector Overlapping Areas
- Topology examples (GRASS Programmer's Manual)