GRASS GSoC 2013 GRASS GIS Interactive Scatter Plot Tool

From GRASS-Wiki
Jump to navigation Jump to search

(See also other GRASS GSoC 2013 projects)

Student Name: Stepan Turek, Czech Technical University in Prague
Organization: OSGeo - Open Source Geospatial Foundation
Mentor Name: Martin Landa Backup mentor: Michael Barton
Title: GRASS GIS Interactive Scatter Plot Tool

Goal of the project is development of interactive scatter plot tool, which will be integrated into wxGUI. The tool will improve GRASS GIS supervised classification workflow. Also it will be useful for raster data analysis in general. Additional goal is extension of GRASS GIS plotting capability by integration of Matplotlib plotting library into wxGUI.



Common analysis, performed on remote sensing data, is supervised classification. GRASS GIS offers necessary tools needed for performing classification workflow. However, there are some weak spots, which can be improved.

One of such a spot is limited real time analyzing of classes (categories) according to chosen data in training areas. Recently, the process of training area selection has been significantly improved by wxGUI Supervised Classification Tool. This tool simplifies definition of classes and their training areas. It also allows to show some basic statistics for classes and bands.

GRASS GIS has Bivariate Scatterplot Tool, which creates scatter plot for selected bands combination. However this tool is not connected with the Supervised Classification Tool and therefore it can not be used in process of training areas selection.



The idea

Main idea of the project is development of tool, which would allow to look at raster data from new points of view unveiling relations, which otherwise would stay hidden.

The tool would be fully integrated into wxGUI therefore it will be connected with already existing components (Map display window and the Supervised Classification Tool).

Thanks to the connection with the Supervised Classification Tool it will be possible to highlight plotted points according to currently chosen pixels for classes by it's training areas. Thanks to this information user will be able to gradually change configuration of training areas achieving better classes separability during supervised classification. Also the tool will be able to plot confidence ellipses for classes.

The scatter plot tool will be able to work with multiple plots, which will represent multiple raster bands combinations. It will be useful for imagery data, which are usually comprised from more than two bands. All such a plots will be interactively connected, therefore user will be able to define areas in plots and the tool will highlight pixels in map display window and corresponding points in other plots. These areas will be grouped into categories and there will be kind of category manager, which will allow to define the categories plotting order, plotting style etc.

Also it can be used for pure pixels identification. If bands of principal component analysis of analysed imagery data will be added into the tool, it will be possible to identify endmember points, which lie in a corners of PCA space scatter plot. It will be possible to see the pure pixels in map display window and corresponding points in other plots by selection of these points. The pure pixels are suitable to be used as class training areas, because they are not mixed with other classes (endmembers) and therefore they represent only surface of individual class (endmember).

It will be possible to create training areas from scatter plot. This feature could be used for creation of training areas from pure pixels.

The tool could be useful for analyzing of raster data in general, therefore it will be also integrated into Map display window. For instance, it could help to see correlation of analyzed data or detect outliers. It will be possible to create mask from selected pixels belonging to some category. Using this feature it will be possible to easily get rid of the effect of the outliers in subsequent analyses.

Because analyzed data can have big size, significant attention will be paid to make the tool memory efficient and fast.

Plotting part of the tool will use Matplotlib library. The library will be integrated into wxGUI. The integration should make plotting in wxGUI easier and plotting code reusable e. g. settings of plot properties or implementation of more abstract functions.

Exact way of integration will be determined when I will be more experience with usage of Matplotlib.

Usage of integrated Matplotlib will be demonstrated by development of feature, which allows to create different graph types (e. g. line graph, bar graph) from data in vector maps or raster maps.

Raster library changes suggestion

WARNING: This section is under development.

WARNING: These ideas can be completely out :-)

Important Raster Library structures

struct RastLib__ /* staticly initialized when library is loaded */
    struct * R__ r; /* R__ instance with index 0 would be initialized when Rast_init is called
                      (it would be equivalent to the currently statically initialized __R instance)*/ 

    int fileinfo_count; 
    struct fileinfo *fileinfo; /* array of pointers to used rasters, fd number is assigned according to position in this array */

struct R__			
    int rd; /* rd number would be equivalent to fd  number for R__ strucutre
               it would be position in r array of RasterLib__ structure  */
    int fileinfo_count;
    struct fileinfo *fileinfo; /* arrays of  pointers to fileinfo (rasters) belonging into this R__ environment */

struct fileinfo			/* Information for opened cell files */
   int fd; /* fd number of fileinfo instance*/
   struct R__ r; /*pointer to structure when the instance belongs*/


Backward compatibility

Basically with this solution we are not breaking backward compatibility in functions, where the fd identifier is passed as argument. With fd identifier we are able to find in RastLib__ corresponding fileinfo instance which hold pointer to it's __R instance. Unfortunately there is another more problematic kind of functions.

It should be clear from thi example:

int input_fd;
char inmap;
struct Cell_head * window;
RASTER_MAP_TYPE data_type;
input_fd = Rast_open_old(inmap, ""); /* there would be also something like Rast_open_old_dynamic(const char * name, const char *mapset, struct R__); */
data_type = Rast_get_map_type(input_fd); /* this function is nice, it does not require any change */
cell = Rast_allocate_buf(data_type); /* this function is not so nice, because it approaches through Rast_window_cols static __R structure 
                                       and with it's current interface we have no way to say it that we want to create cell buffer for raster using dynamic __R instance. */
Rast_get_window(window)	/* also problem which window we want? (it expects only one static R__) */

Rast_close (input_fd); /* no problem */

As it was shown in the example, the biggest problem of the proposed changes is to deal with kind of functions as Rast_allocate_buf or Rast_get_window, which relies on presence of only one static R__. It includes e. g. functions in set_window.c, auto_mask.c and others.

I see three ways of dealing with this kind of functions:

Option 1

We can change all this functions: e. g.

Rast_allocate_buf(RASTER_MAP_TYPE data_type, struct * R);
Rast_get_window(struct Cell_head *window, struct * R);

This would break backward compatibility and we would need change immediately all the modules.

Option 2

In order to be fully backward compatible we can add new functions: e. g.

Rast_allocate_buf(RASTER_MAP_TYPE data_type);
Rast_allocate_buf_d(RASTER_MAP_TYPE data_type, struct * R);
Rast_get_window(struct Cell_head *window); 
Rast_get_window_d(struct Cell_head *window, struct * R);

However this would sometimes in the future (G8?) lead to existence of only Rast_get_window_d functions, and the functions with nice names would be dropped.

Option 3

Maybe the best solution would be to rename these all problematic functions (should not be so hard):

Rast_allocate_buf_s(RASTER_MAP_TYPE data_type);
Rast_allocate_buf(RASTER_MAP_TYPE data_type, struct * R);
Rast_get_window_s(struct Cell_head *window); 
Rast_get_window(struct Cell_head *window, struct * R);

Except change of these functions names's, we do not need to modify modules immediately. Main advantage is that in future we would end up with only nice names. In other words we will get to Option 1 in future.

Project plan

Period Task Status Notes
May 27 - June 16 Bonding period
June 17 back-end integration with map window
June 24 back-end integration with map window
July 1 back-end integration with wxGUI Supervised Classification Tool
July 8 back-end integration with wxGUI Supervised Classification Tool
July 15 back-end integration with wxGUI Supervised Classification Tool
July 22 testing and optimization of back-end
July 29 testing and optimization of back-end
August 2 Mid-term evaluation The goal: back-end is finished
August 5 design of Matplotlib integration into wxGUI
August 12 development of front end of the scatter plot tool
September 2 implementation of the plotting feature for vector/raster data
September 9 implementation of the plotting feature for vector/raster data
September 16 testing, documentation
September 23 Final evaluation




Week 1 (June 17)

  • worked on integration of the scatter plot tool with mapwindow

Week 2 (June 24)

  • implemented highlighting of pixels in mapwidnow according to selected areas in scatter plots
Highlighted pixels in mapwindow (red color) corresponding to selected areas in scatter plots (green color) (2013-7-21)

Weekly Reports