Large File Support: Difference between revisions
(credit glynn's help) |
|||
Line 20: | Line 20: | ||
If you want to obtain the current offset for a file whose size exceeds | If you want to obtain the current offset for a file whose size exceeds | ||
the range of a signed long, you instead have to use the (non-ANSI) | the range of a signed long, you instead have to use the (non-ANSI) | ||
ftello() function, which returns the offset as an off_t | ftello() function, which returns the offset as an off_t. | ||
'''''TODO:''''' But before we do that, we would need to add configure checks so that we don't try to use ftello() on systems which don't provide it. | |||
There isn't a truly portable solution. Some platforms might not even | |||
have an integral type larger than 32 bits. | |||
The most practicaly solution is to use ftello() if it's available. | |||
This will require some configure checks. These are simple enough to | |||
implement; it's the design which is problematic (as usual). | |||
Unlike most HAVE_FOO checks, fseeko() isn't a simple have/don't-have | |||
check. Rather, it's usually a case that the function is available only | |||
when certain macros are defined (e.g. _LARGEFILE_SOURCE). | |||
That gives rise to the question of what we check for, how we check for | |||
it, how we pass that information to the code, and how we use it. | |||
Rather than try to come up with some infrastructure which allows us to | |||
use LFS in a piecemeal fashion, it would be preferable to clean up the | |||
GRASS code so that we can enable LFS globally. Then, we can just add: | |||
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 | |||
to CPPFLAGS, and not have to worry about adding the necessary macros | |||
to individual files. Any HAVE_* checks then become simple | |||
have/don't-have checks. | |||
== Coding LFS in GRASS == | == Coding LFS in GRASS == |
Revision as of 07:03, 29 October 2006
(largely based on comments by Glynn Clements on the GRASS-dev mailing list)
The need
Standard C <stdio.h> file functions return file sizes as long integer. On 32-bit systems this overflows at 2 gigabytes. For support of files bigger than this, you need LFS. Currently only implimented in GRASS in libgis. (i.e. there is support for reading+writing raster maps, but not many import/export modules or vector functions have it)
The issues
The problem is that ftell() returns the result as a (signed) long. If the result won't fit into a long, it returns -1 (and sets errno to EOVERFLOW).
This can only happen if you also set _FILE_OFFSET_BITS to 64 so that fopen() is redirected to fopen64(), otherwise fopen() will simply refuse to open files larger than 2GiB (apparently, this isn't true on some versions of MacOSX, which open the file anyhow then fail on fseek/ftell once you've passed the 2GiB mark).
If you want to obtain the current offset for a file whose size exceeds the range of a signed long, you instead have to use the (non-ANSI) ftello() function, which returns the offset as an off_t.
TODO: But before we do that, we would need to add configure checks so that we don't try to use ftello() on systems which don't provide it.
There isn't a truly portable solution. Some platforms might not even
have an integral type larger than 32 bits.
The most practicaly solution is to use ftello() if it's available.
This will require some configure checks. These are simple enough to implement; it's the design which is problematic (as usual).
Unlike most HAVE_FOO checks, fseeko() isn't a simple have/don't-have check. Rather, it's usually a case that the function is available only when certain macros are defined (e.g. _LARGEFILE_SOURCE).
That gives rise to the question of what we check for, how we check for it, how we pass that information to the code, and how we use it.
Rather than try to come up with some infrastructure which allows us to use LFS in a piecemeal fashion, it would be preferable to clean up the GRASS code so that we can enable LFS globally. Then, we can just add:
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
to CPPFLAGS, and not have to worry about adding the necessary macros to individual files. Any HAVE_* checks then become simple have/don't-have checks.
Coding LFS in GRASS
Currently the --enable-largefile switch only enables LFS in libgis, not anywhere else.
[Although config.h includes definitions to enable LFS automatically, those definitions are currently inactive. This is probably a good thing; a lot of GRASS' code isn't LFS-aware, and explicit failure is preferable to silently corrupting data.]
To enable LFS elsewhere, you need to manually add -D_FILE_OFFSET_BITS=64 to the compilation flags. The simplest approach is to add to the module's Makefile:
ifneq ($(USE_LARGEFILES),) EXTRA_CFLAGS = -D_FILE_OFFSET_BITS=64 endif
and add include config.h before all other header files in the code.
#include <grass/config.h> #include <stdio.h> #include <string.h> #include <grass/gis.h> ...
int versus off_t
You may as well just use "off_t filesize" unconditionally. An "off_t" will always be large enough to hold a "long".
LFS-safe libs and module list
- libgis
LFS works in progress
- r.in.xyz
- r.terraflow (intregrate current LFS support into GRASS's --enable-largefile ./configure switch)
(r.terraflow creates huge temporary files which can easily go over 2GB)
LFS wish list
High priority modules to get LFS
- r.in.*
- r.out.*
- GRASS GDAL plugin (??)
- v.surf.rst
- v.surf.idw(2)
- vector libs (limited by number of features)
- v.in.ascii -bt (without topology)
- DB libs