Large raster data processing
Create a mosaic
Suppose that we have all ASTER GDEM world coverage (>22000 files) and we aim to build a mosaic of Europe and import it in GRASS. A nice reference on how to deal with ASTER GDEM is here. The very first step is to select among the >22000 files those covering our area of interest (Europe). To this aim, we can use use gdaltindex to create an index of all the files:
gdaltindex an_index.shp *.tif
This command will create a polygon shapefile with the footprint of each raster as the polygon shape and the name of the image files represented in the attribute table. After that, we just need to do a spatial select on the shapefile and extract the filenames from the attributes from the selected polygons:
ogr2ogr -f CSV list.csv an_index.shp -spat xmin ymin xmax ymax
will produce a CSV file, from which we can easily copy the list of files in a list.txt.
Our area of interest has the following boundaries:
north: N71: ymax = 72.0001389 south: N34: ymin = 33.9998611 west: W011: xmin = -11.0001389 east: E042: xmax = 43.0001389
Now that we have a list of the files interesting our area, and since our files are all in the same datum (WGS84), we can use gdalbuildvrt to create a virtual mosaic. This will take only a few seconds to run.
gdalbuildvrt -input_file_list list.txt mosaic.vrt
Also note that gdal can read Tifs within a zip/gz-file as well.
To import the mosaic into GRASS, we first need to create a WGS84 location, then:
r.external -r input=/path/to/mosaic.vrt output=mosaic
Number of files limitation: troubleshooting
Linux (and other OSes, likewise) has a limit of opening 1024 files in parallel. So that you may experience the following error message:
Too many open files
However, this can be enlarged, see r.series manual page --> "Number of raster maps to be processed is given by the limit of the operating system."
Resource limits are per-process, and inherited. The ulimit command (which is a shell built-in) changes the limits for the current shell process; the new limit will be inherited by any child processes.
On most Linux systems, resource limits are set on login by the pam_limits module according to the settings contained in /etc/security/limits.conf /etc/security/limits.d/*.conf.
You should be able to edit those files if you have root privilege via sudo, but you will need to log in again before any changes take effect.
To change the limits for an existing session, you may be able to use something like:
sudo bash ulimit ... su -c bash <username>
However, sudo will probably reset certain environment variables, particularly LD_LIBRARY_PATH, so you may need to reset those in the new shell.