Large raster data processing
Large File Support (LFS)
Affects 32bit systems which are normally limited to 2GB (2^31) per file. As workaround, LFS can be enabled in GRASS GIS at compile time.
- GRASS GIS 6: much of the raster library and modules can be enabled for LFS during compilation
- GRASS GIS 7: raster and vector libraries are LFS enabled.
See also: Large File Support
Workaround: Select a smaller area within a very large dataset before importing in GRASS
Suppose that we have all ASTER GDEM world coverage (>22000 files) and we aim to build a mosaic of Europe and import it in GRASS. A nice reference on how to deal with ASTER GDEM is here. The very first step is to select among the >22000 files those covering our area of interest (Europe). To this aim, we can use use gdaltindex to create an index of all the files:
gdaltindex an_index.shp *.tif
This command will create a polygon shapefile with the footprint of each raster as the polygon shape and the name of the image files represented in the attribute table. After that, we just need to do a spatial select on the shapefile and extract the filenames from the attributes from the selected polygons:
ogr2ogr -f CSV list.csv an_index.shp -spat xmin ymin xmax ymax
will produce a CSV file, from which we can easily copy the list of files in a list.txt.
Here, our area of interest (subset) has the following boundaries:
north: N71: ymax = 72.0001389 south: N34: ymin = 33.9998611 west: W011: xmin = -11.0001389 east: E042: xmax = 43.0001389
At 100m this corresponds to 27.8 billion cells.
Create a mosaic
Now that we have a list of the files interesting our area, and since our files are all in the same datum (WGS84), we can use gdalbuildvrt to create a virtual mosaic. This will take only a few seconds to run.
gdalbuildvrt -input_file_list list.csv mosaic.vrt
Also note that gdal can read Tifs within a zip/gz-file as well.
To import the mosaic into GRASS, we first need to create a WGS84 location, then:
r.external -r input=/path/to/mosaic.vrt output=mosaic
Troubleshooting
Number of open files limitation
When working with time series or multispectral data, you may experience the following error message (r.series, r.patch etc.):
Too many open files
Many Linux version (and other OSes, likewise) have the limitation of opening 1024 files at the same time. The limits are just to prevent a process from (accidentally or deliberately) consuming too much memory. The descriptor table is in kernel memory, and can't be swapped to disk. However, this can be enlarged. Note that, in GRASS GIS 7.0, each raster map requires two open files: one for the cell/fcell file, one for the null bitmap.
Resource limits are per-process, and inherited. The ulimit command (which is a shell built-in) changes the limits for the current shell process; the new limit will be inherited by any child processes.
System-wise change
On most Linux systems, resource limits are set on login by the pam_limits module according to the settings contained in /etc/security/limits.conf /etc/security/limits.d/*.conf. You should be able to edit those files if you have root privilege (also via sudo), but you will need to log in again as a user before any changes take effect. A reboot shouldn't be necessary since changes to limits.conf should take effect for any subsequent logins. A hard limit can't be increased for any existing non-root processes, or those descended from them.
Increasing a hard limit requires root privilege (or at least the CAP_SYS_RESOURCE capability), so the limits have to be set by the program which manages the login (login, xdm, sshd, etc) after the user has identified themself (so it knows which limits to set) but before the process changes its ownership from root to the logged-in user.
sudo su vim /etc/security/limits.conf
Add in /etc/security/limits.conf something like this:
# Limit user nofile - max number of open files * soft nofile 1500 * hard nofile 1800
Restart the user session (i.e., logout, login).
Session only change
To change the limits for an existing session, you may be able to use something like:
sudo bash ulimit ... su -c bash <username>
However, sudo will probably reset certain environment variables, particularly LD_LIBRARY_PATH, so you may need to reset those in the new shell.