GRASS GIS APIs: Difference between revisions

From GRASS-Wiki
Jump to navigation Jump to search
(minor shell fixes for G7; no need to use --o)
Line 19: Line 19:
Here is a simple example of such an application that extracts estimated stream locations from a digital elevation model, transforms them to vector format and the creates a 500m buffer around them:
Here is a simple example of such an application that extracts estimated stream locations from a digital elevation model, transforms them to vector format and the creates a 500m buffer around them:


  g.region rast=elevation
  g.region raster=elevation -p
  r.watershed elevation=elevation threshold=10000 stream=raster_streams --o
  r.watershed elevation=elevation threshold=10000 stream=raster_streams
  r.to.vect intput=raster_streams output=vector_streams type=line --o
  r.to.vect intput=raster_streams output=vector_streams type=line
  v.buffer input=vector_streams output=stream_buffers distance=500 --o
  v.buffer input=vector_streams output=stream_buffers distance=500


Just putting the above lines into one file that the operating system can read at the command line can already be considered "programming an application" and by just changing the name of the raster used as input the same program can be launched on different data. There are many advantages of creating such application instead of clicking through the different commands in a graphical user interface:
Just putting the above lines into one file that the operating system can read at the command line can already be considered "programming an application" and by just changing the name of the raster used as input the same program can be launched on different data. There are many advantages of creating such application instead of clicking through the different commands in a graphical user interface:
Line 63: Line 63:


  import grass.script as grass  
  import grass.script as grass  
  grass.run_command('g.region', rast='elevation')
  grass.run_command('g.region', raster='elevation')
  grass.run_command('r.watershed', elevation='elevation', threshold=10000, stream='raster_streams', overwrite=True)
  grass.run_command('r.watershed', elevation='elevation', threshold=10000, stream='raster_streams', overwrite=True)
  grass.run_command('r.to.vect', input='raster_streams', output='vector_streams', type='line', overwrite=True
  grass.run_command('r.to.vect', input='raster_streams', output='vector_streams', type='line', overwrite=True

Revision as of 18:42, 31 January 2015

Introduction to this document

GRASS GIS is in continues development and this development is what keeps GRASS GIS alive and at the forefront of GIS evolution. From users stringing together a series of modules to build a novel processing chain to developers implementing new cutting-edge algorithms in the core of the GRASS GIS C source code there are many different ways to develop with and for GRASS GIS. This document aims at clarifying the role of the different application programming interfaces (APIs) available. For each of the different APIs, it presents typical use cases and introduces the basic logic of the API. For more detailed information, please look at more detailed introduction on GRASS GIS and Development and GRASS GIS and Python as well as the programming manuals for the C and Python APIs.

The GRASS GIS environment

Note that all of the below code examples require that they are run with necessary GRASS GIS environment variables defined in order to allow the system to find the relevant modules and libraries. You can either set up this environment by launching the GRASS GIS startup script or by manually setting up your environment.

GRASS GIS Modules as "functions"

GRASS in itself is not a monolithic application, but rather a collection of over 300 applications, called modules, following the Unix philosophy with the idea that each module does one thing and does it well, even if that thing is trivial, and that real power comes from combining these tools.

A consequence of this principle is that each module is autonomous in its functioning, including its memory management and error handling. In other words, as GRASS GIS is not a monolithic application, GRASS as such cannot crash, only individual modules can.

Module output in the form of new maps is automatically stored in the GRASS GIS Database and can thus be reused by follow-up modules. Other types of outputs can either be stored in temporary files to be read later by other modules or data can be communicated between modules via the standard streams, i.e. the standard output and input.

With this structure in mind, one can argue that GRASS GIS in itself is an application programming interface with each module playing the role of a "function". Chaining these functions thus allows any user to create an application of any level of desired complexity. The description of this API can be found in the module man pages or on the command line via the --help parameter.

Here is a simple example of such an application that extracts estimated stream locations from a digital elevation model, transforms them to vector format and the creates a 500m buffer around them:

g.region raster=elevation -p
r.watershed elevation=elevation threshold=10000 stream=raster_streams
r.to.vect intput=raster_streams output=vector_streams type=line
v.buffer input=vector_streams output=stream_buffers distance=500

Just putting the above lines into one file that the operating system can read at the command line can already be considered "programming an application" and by just changing the name of the raster used as input the same program can be launched on different data. There are many advantages of creating such application instead of clicking through the different commands in a graphical user interface:

  • archiving your work flow
  • making your work flow easily reproducible
  • being able to send your work flow to others

However, in this form one is limited to just chaining modules without any control flow such as conditional execution of repeated execution of certain parts of the code. It can, thus be interesting to embed the calls to GRASS GIS modules in a program written in a programming language. Many programming language allow system calls which enable the programmer to call the GRASS GIS modules while having at the same time all the functions of the language at hand.

Here are some examples of such system calls (using the second line of the above example) in different programming languages:

  • Python
import subprocess
subprocess.call('r.watershed elevation=elevation threshold=10000 stream=raster_streams', shell=True)
  • C
void main()
 {
   system("r.watershed elevation=elevation threshold=10000 stream=raster_streams");
 }
  • Perl
system "r.watershed elevation=elevation threshold=10000 stream=raster_streams";

These system calls are easy to handle when no output is expected from the GRASS module. When output needs to be collected then the programming task already becomes a little harder unless you know what you are doing. It is for this reason that the Python GRASS libraries where developed that are explained in the next section.

The Python scripting library

As was mentioned at the end of the previous section, just calling GRASS GIS modules via system calls out of a language can come very handy, but it is also not always straightforward to deal with the output of these modules. As a first solution to that, the GRASS Python Scripting Library was introduced in GRASS 6.

The aim of this library is to provide a light-weight, easy to learn toolkit that eases the interaction with GRASS GIS modules from Python. It mainly offers specialized wrapper functions to facilitate calling grass modules and capturing their output, as well as a few specific functions for frequent tasks (which themselves are again generally just wrapper functions around module calls). It allows to embed the use of GRASS GIS modules into the larger Python environment, without creating a complete pythonic access to the entire GRASS GIS API. It was used to transform all the shell scripts in the GRASS GIS source code to Python scripts in GRASS7.

The library is a perfect starting point for beginners who just want to write simple scripts without wanting to dive too deeply into Python philosophy and style.

Here is the same example as above using the run_command() function. This shows that the syntax stays very close to the actual GRASS GIS module calls.

import grass.script as grass 
grass.run_command('g.region', raster='elevation')
grass.run_command('r.watershed', elevation='elevation', threshold=10000, stream='raster_streams', overwrite=True)
grass.run_command('r.to.vect', input='raster_streams', output='vector_streams', type='line', overwrite=True
grass.run_command('v.buffer', input='vector_streams', output='stream_buffers', distance=500, overwrite=True)

There are a series of different functions to call modules depending on how output should be handled:

#get the current region settings as a list of parameters one per line (can then be treated with the Python splitlines() function
>>> grass.read_command('g.region', flags='g')
'n=228500\ns=215000\nw=630000\ne=645000\nnsres=10\newres=10\nrows=1350\ncols=1500\ncells=2025000\n'
#get current region settings as a python dictionary:
>>> grass.parse_command('g.region', flags='g')
{'rows': '1350', 'e': '645000', 'cells': '2025000', 'cols': '1500', 'n': '228500', 's': '215000', 'w': '630000', 'ewres': '10', 'nsres': '10'}
#or using a dedicated function as a shortcut (which also directly transforms all numbers from strings to numeric format)
>>> grass.region()
{'rows': 1350, 'e': 645000.0, 'cells': 2025000, 'cols': 1500, 'n': 228500.0, 's': 215000.0, 'w': 630000.0, 'ewres': 10.0, 'nsres': 10.0}

For a description of the different functions to call modules, see the relevant section on the dedicated wiki page.

As mentioned, this library is meant as a small toolkit for making it easier to call GRASS GIS modules from Python. It does not, however, offer a real pythonic access to GRASS GIS, and no access to the underlying functions. This is the role of the pygrass library discussed below.

Graphical programming with the GRASS GIS modeler

Another easy entry into programming with the GRASS GIS Python scripting library is the graphical modeler. Any model created graphically can also be saved as a Python script. For more details see the dedicated wiki page.

The PyGRASS API

The PyGRASS API provides two layers of access to GRASS:

  1. An access to GRASS modules in the same line as the Python scripting library
  2. An access to the underlying functions of the C-library thus giving access to GRASS algorithms and geometries directly from Python

Accessing GRASS modules

The first layer, the interface to GRASS GIS modules provides a unified, object-oriented access to GRASS GIS modules. Through its design modules can be called quite similarly to how they would be called on the command line:

>>> from grass.pygrass.modules import Module
>>> Module('g.region', flags='p')
projection: 99 (Lambert Conformal Conic)
zone:       0
datum:      nad83
ellipsoid:  a=6378137 es=0.006694380022900787
north:      318500
south:      -16000
west:       124000
east:       963000
nsres:      500
ewres:      500
rows:       669
cols:       1678
cells:      1122582
Module('g.region')

This is quite similar to the GRASS GIS Python scripting library's run_command() function. However, PyGRASS does not provide wrappers such as the read_command() and parse_command() and thus the developer has to handle output in the same way as it would be handled at the command line, i.e. by manipulating the standard input and standard output streams.

In order to mimic even closer the GRASS GIS command line, PyGRASS offers shortcuts to modules which are equivalent to the GRASS GIS modules calls:

>>> from grass.pygrass.modules.shortcuts import general as g
>>> g.region(flags='g')

>>> from grass.pygrass.modules.shortcuts import vector as v
>>> v.info('schools')

At the same time, the PyGRASS library allows to treat modules as objects and provides a series of functions for these objects:

>>> from grass.pygrass.modules import Module
>>> gregion=Module('g.region', flags='g')
>>> gregion
Module('g.region')
>>> gregion.name
'g.region'
>>> gregion.description
'Manages the boundary definitions for the geographic region.'
>>> gregion.run()
n=318500
s=-16000
w=124000
e=963000
nsres=500
ewres=500
rows=669
cols=1678
cells=1122582
Module('g.region')
>>> gregion.inputs['raster'].value='elevation'
Module('g.region')

In all, this API to module calls thus allows to integrate modules into a Python object oriented logic and manage them in more flexible ways, but it also demands a bit more understanding from the programmer concerning output handling.

Accessing low-level C-functions

It is the second layer of the PyGRASS library that sets it completely apart from the GRASS GIS Python scripting library. This makes it possible to directly access geometry features and a selection of low-level C-functions and thus to program a complex GRASS GIS application in Python.

In addition to the modules class mentioned above, the library is divided into different packages. The most important are raster, vector and gis. The latter offers the possibility to manage the GRASS database (location, mapset, etc) as well as the working region. The two other packages offer functions to handle the respective map types, but also a direct access to the content of such maps.

Here is an example of accessing a vector point map and adding a new point:

>>> from grass.pygrass.vector import VectorTopo
>>> schools = VectorTopo('schools')
>>> schools.open(mode='rw')
>>> schools.num_primitive_of('point')
167

>>> from grass.pygrass.vector.geometry import Point
>>> new_school = Point()
>>> new_school.x = 643550
>>> new_school.y = 216200
>>> schools.write(new_school)
>>> schools.close()

>>> schools.open(mode='r')
>>> schools.num_primitive_of('point')
168

>>> for school in schools:
...     print(school.id, school)
...
(1, Point(633649.285674, 221412.944348))
(2, Point(628787.129283, 223961.620521))
[...]
(167, Point(650870.509540, 247064.343249))
(168, Point(643550.000000, 216200.000000))
>>> schools.close()

The [grass.osgeo.org/grass71/manuals/libpython/pygrass_raster.html raster package] of the library offers different access types to raster maps: by row, by segment, read-only, read-and-write, as NumPy array. One can then treat the raster map as any matrix of numbers and write the results of treatments into a new, or the same raster map.

The PyGRASS library thus offers a more sophisticated access to GRASS GIS data and GRASS GIS functions. It contains the necessary functions to read and write raster and vector data that can then be treated with any Python tools the developer wants. It is conceived in a pythonic logic and thus allows a quite seamless integration of GRASS GIS into Python applications.

The C-API

The GRASS GIS core C-library has been in constant development for over 30 years. The fundamental functions were well-designed from the start and many have not changed much since then, but many more functionalities have been added. A selection of major improvements to he C-library were the introduction of floating-point and null support (GRASS 5), a completely new, quite sophisticated vector library including network analysis (GRASS 6), large-file support, significant performance optimization and much better cross-platform usability (GRASS 6-7).

The GRASS GIS C-API provides access to the different data types and related functions. There are three main core libraries:

  • gis: fundamental operations, data structures and GRASS database management
  • raster: creation, handling and processing of raster data
  • vector: creation, handling and processing of vector data

The list of the other libraries is quite long (see the Programmer's Manual for the complete list) and includes functions for different mathematical calculations, the treatment of satellite imagery, displaying maps, projection handling, 3D-support, etc.

A consistent naming scheme is used to ease the identification of the source of a particular function, e.g.

  • G_ = gis library
  • Rast_ = raster handling
  • Vect_ = vector handling
  • I_ = satellite imagery
  • PNG_ = png display driver

etc.

The libraries also define a whole host of data structures, of which some of the most important are:

  • Cell_head containing metadata concerning a raster map or a region
  • Map_info containing metadata concerning a vector map
  • field_info containing information about a vector map database connection

etc.

It is beyond the scope of this page to go into detail about the use of the the C-API. This is the role of the Programmer's Manual. Using the C-API to program new modules for GRASS GIS demands a good knowledge of the C-language. The best way to get a first impression on how to use the C-API is to either find an existing module that is similar to the one you want to program, or to start with the two example modules r.example and v.example, including the related Makefile. Another great starting point, combining both raster and vector library usage in a short program is the module v.to.rast3 of which the core loop is reproduced below for illustration (note the use of the gis (G_), 3D-raster (Rast3d_), vector (Vect_), attribute database (db_ ) libraries.

   nlines = Vect_get_num_lines(&Map);
   for (line = 1; line <= nlines; line++) {
       int type, cat, depth, row, col, ret;
       double value;
 
       G_percent(line, nlines, 2);
 
       type = Vect_read_line(&Map, Points, Cats, line);
       if (!(type & GV_POINT))
           continue;

       Vect_cat_get(Cats, field, &cat);
       if (cat < 0) {
           continue;
       }
       /* Check if the coordinates are located in the cube */
       if (!Rast3d_is_valid_location(&(map->region), Points->y[0], Points->x[0], Points->z[0])) {
           continue;
       }
       /* Convert the north, east and top coorindate into row, col and depth*/
       Rast3d_location2coord2(&(map->region), Points->y[0], Points->x[0], Points->z[0], &col, &row, &depth);

       if (ctype == DB_C_TYPE_INT) {
           int ivalue;

           ret = db_CatValArray_get_value_int(&cvarr, cat, &ivalue);
           value = ivalue;
       }
       else if (ctype == DB_C_TYPE_DOUBLE) {
           ret = db_CatValArray_get_value_double(&cvarr, cat, &value);
       }

       if (ret != DB_OK) {
           G_warning(_("No record for line (cat = %d)"), cat);
           continue;
       }

       G_debug(3, "col,row,depth,val: %d %d %d %f", col, row, depth, value);

       Rast3d_put_float(map, col, row, depth, (float)value);
   }

The C-API is thus meant for people who are willing to learn C-language programming and to spend some time understanding the functions and data structures of the API. The big advantage of the use of this API is to be able access the data directly and to program fast routines for their processing, allowing the implementation of new algorithms, or avoiding a possible performance penalty that chaining existing modules might entail.

Easy user interace creation

Whatever the API you chose, when writing GRASS GIS scripts it is very easy to create a user interface, including a graphical user interface, to your script in a semi-automatic way. This interface will then handle all the input (choice of maps, values of parameters, names of outputs, etc) expected from the user. To see the details and examples for the use of this function, see the manual page.

Here is an example showing the option descriptions of the v.db.addcolumn module:

#%module
#% description: Adds one or more columns to the attribute table connected to a given vector map.
#% keyword: vector
#% keyword: attribute table
#% keyword: database
#%end

#%option G_OPT_V_MAP
#%end

#%option G_OPT_V_FIELD
#% label: Layer number where to add column(s)
#%end

#%option
#% key: columns
#% type: string
#% label: Name and type of the new column(s) ('name type [,name type, ...]')
#% description: Data types depend on database backend, but all support VARCHAR(), INT, DOUBLE PRECISION and DATE
#% required: yes
#%end

The result of this definition are the following command line parameters:

> v.db.addcolumn --help

Description:
 Adds one or more columns to the attribute table connected to a given vector map.

Keywords:
 vector, attribute table, database

Usage:
 v.db.addcolumn map=name [layer=string] columns=string [--help]
   [--verbose] [--quiet] [--ui]

Flags:
 --h   Print usage summary
 --v   Sortie du module en mode bavard
 --q   Sortie du module en mode silence
 --ui  Force launching GUI dialog

Parameters:
      map   Name of vector map
             Or data source for direct OGR access
    layer   Layer number where to add column(s)
             Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
            par défaut: 1
  columns   Name and type of the new column(s) ('name type [,name type, ...]')
             Data types depend on database backend, but all support VARCHAR(), INT, DOUBLE PRECISION and DATE

The graphical result is this:

GUI interface resulting from above option definitions

Coding standards

A final word about coding standards: in order to keep the large code-base of GRASS GIS in a readable and understandable state, the GRASS GIS Development Team has developed over the time a series of coding standards that any submission of code into the GRASS GIS code base should follow. These coding standards are defined for several different languages, but notably for C and Python code and you are encouraged to read and integrate them into your coding from the start in order to avoid the drag of cleanup later own.

See also