V.krige GSoC 2009
v.krige: Python porting and wxGUI addition
Anne Ghisla's Google Summer of Code 2009 project, mentored by Martin Landa and Michael Barton
Aim of the project
As GRASS presently lacks kriging capability, it is performed via an add-on, v.autokrige, that delegates analysis to R (package automap). This module is written in ksh and has the classical autogenerated GUI. The project aims to rewrite the module in Python, creating a new GUI in wxPython that allows the user to refine parameters. I therefore plan to examinate present v.autokrige code and port it into Python, possibly improving it at the same time, and add a wxPython GUI. All time left will be dedicated to addition of further functionality, in respect of the most needed.
State of the art
V.krige is now into GRASS trunk and devbr_6 code! (r40048 or higher)
Dependency check is performed on grass-python module, R, rpy2 and R packages gstat and spgrass6. If something is missing, the module won't start. Some graceful popup window will be set up in the near future. At the moment it aborts with a message in the shell.
Kriging relies on R package gstat, the most widespread. geoR tab is now hidden as it doesn't provide any functionality, but if anyone is interested in using it instead of gstat I'll put this todo higher in the list.
The list of point layers is provided by VectorSelect class (gselect module) to provide uniformity of layout with wxGUI. The filtering is done once at the start of the module, then if the user adds a suitable point layer she/he will press the Refresh button to force addition to the list. Automatic refresh on popup has been tested as too time and resource consuming.
Installation
For GRASS 6.5 <r40048, it is easy to install it via g.extension (for GRASS 6.4 it is an AddOn, from 6.5+/7 onwards included). Simply type in a GRASS shell:
g.extension v.krige
To install v.krige.py from source, please refer to Compile_and_Install#Scripts. Source is available on SVN:
svn co https://svn.osgeo.org/grass/grass-addons/vector/v.krige
The module requires GRASS >= 6.4 RC5 - it uses the new version of grass-python library. At the moment, no backward compatibility with previous versions is provided.
The module has dependencies:
- R 2.x - no exhaustive tests on compatible versions. An up-to-date R package will do the job in 99% cases.
- R packages spgrass6, gstat and maptools
- Python module rpy2
Be sure that ALL these dependencies are installed and operative. See below for platform-specific details.
Notes for Debian GNU/Linux
Install the dependiencies:
aptitude install R python-rpy2
Attention! python-rpy IS NOT SUITABLE.
To install R packages, use either R's function (as root):
install.packages("gstat", dep=T) install.packages("spgrass6", dep=T) install.packages("maptools", dep=T)
either the brand new Debian packages [6], add to repositories' list for 32bit:
deb http://debian.cran.r-project.org/cran2deb/debian-i386 testing/
or, for 64bit:
deb http://debian.cran.r-project.org/cran2deb/debian-amd64 testing/
and get the packages via
aptitude install r-cran-gstat r-cran-spgrass6 maptools
Notes for Windows
At this very moment, no backward compatibility with OSGeo4W's packaged GRASS 6.4svn2 is provided, nor with WinGRASS. Testing is planned, as latest WinGRASS includes the most recent python library.
If you really need to run v.krige right now on Windows, I suggest to compile GRASS following this guide. You could also use Linux in a virtual machine. Or install Linux in a separate partition of the HD. This is not as painful as it appears, there are lots of guides over the Internet to help you.
Bulletin Board
- TODOs
- [permanent todo] Cleanup code! Refactoring brings refactoring.
- wxGUI: Split widget creation and layout - makes more readable code
- Solve double parsing issue from RunCmd. This leads to unnecessary double dependency check.
- Continue integrating gstat functions - universal kriging first.
- Add region resolution parameter in interface, showing the current value and allowing the user to modify it before kriging. High resolution slows down computation and give better results, but the user should have a clear feedback (I have been surprised by long runs more than once, because I was not aware of resolution).
- wxGUI: GetVector() should check for name validity, if a string is inserted?
- IDEAS TO DISCUSS
- Splash screen at module load will be implemented, but activated only if maps are more than a threshold number [Markus Neteler's suggestion]
- Create a package to delegate dependencies check? (Stefan Krüger's suggestion)
- BROKEN WINDOWS (from Martin Fowler's great book [0])
- Imperfect logging - relies on RunCmd...
- The RBookPage::refresh() function has an endless cycle that keeps R graphics window responsive, even when that window has been closed by the user.
- spgrass6::readVECT6() and mapset name - see this thread.
- FUTURE PLANS
- Rewrite v.krige in order to remove R dependency. It is not nice to add another program as dependency of a core module, a library is acceptable. Possible candidates:
- gstat - Pro: it has native GRASS binding. Con: it is not developed as standalone program since 2003, in favour of R package gstat.
- HPGL http://hpgl.aoizora.org/ - Python/C++ library with Python API. Pro: Very rich set of functions and parallelisation. Con: Not supported on Mac
- Rewrite v.krige in order to remove R dependency. It is not nice to add another program as dependency of a core module, a library is acceptable. Possible candidates:
Documentation
Notes on v.krige usage, examples and tips are available in the module's help page, available via Help button if the GUI.
Kriging theory is available in Isaaks and Srivastava's "An Introduction to Applied Geostatistics" [1]. Another great book.
Users feedback and other kriging software
- Call for users has been done on gfoss.it, grass-it and grass-users, r-sig-geo and r-sig-ecology mailing lists. Thanks to Giovanni Manghi and João Tiago from GFOSS.pt and Giovanni Allegri for user advices, and to Edzer Pebesma, Paul Hiemstra, Ebrahim Jahanshiri, Dylan Beaudette for support on R side.
- There are (have been) some GRASS modules that perform kriging, like:
- v.surf.krige, v.variogram: no longer developed by authors, difficult to port.
- s.surf.krig: deprecated since GRASS 6 series.
- scripts including R and spgrass6: are the concrete basis upon which build the module.
- Obtained ArcGIS availability for comparison of interface and functionality, because it is AFAIK the widespread tool used for kriging. Also Isats [2] is a valuable source of ispiration for interface design.
Planned Timeline
Draft and definitive:
- create a wxPython interface for ordinary kriging. Will follow Humane Interface rules [3].
- integrate R package automap
- midterm deadline: have a working module that performs ordinary kriging.
- integrate more R functions from gstat and geoR, giving the user the choice between gstat and geoR (kriging results vary in respect of implementation)
Weekly Reports, from SoC mailing list [4]
Report #1, 29 May
- Done: I dedicated this week to documentation and discussion with users and developers.
- Documentation: I'm reading "An introduction to Applied Geostatistics" [1], as I need to understand the theory behind kriging functions provided by R. On wxPython side, wxPython wiki is the main source of information, together with the code of GRASS wxPython interface.
- Community discussion: Feedback on interface design and R has been collected on various mailing lists. Also, a group of Portuguese ArcGIS users is interested in giving advice and test the new module.
- Planned for next week
I plan to define which functions (and consequently which R packages) are to be included into the module. I plan to first include package automap (a wrapper for gstat), then gstat advanced functions, then geoR (an "ecological vicariant" of gstat), all alvailable on CRAN [5]. This will allow R users to keep using their preferred functions, as kriging results are implementation-dependent. Then I'll work on wxPython interface and get a draft as soon as possible. The interfaces that I use as model are ArcGIS' kriging module and Isatis. I won't replicate their structure, rather see what are the provided features and create v.autokrige interface following Humane Interface guidelines.
- Bottleneck(s)
I'm experiencing some difficulties, mainly with wxPython, as I never used it before, and also with kriging, for the same reason :) but I'm not worried nor blocked. Last year GSoC project was a bet and I succeeded, this year it's even harder but I can rely on some more experience and, as always, on mentor and community support.
Report #2, 5 June
- Done
- Ended "An Introduction to Applied Geostatistics", great source of information of what the module is supposed to do. Now I can be both developer and user, not discarding any of the hints of other users.
- Some progress on the interface: I'm getting used of wxWidgets, even if the lack of a graphical designer slows down my work. wxDesigner seems good, but I think it is also good for me to learn the meaning of each line of code. The interface is quite ready for automap and GRASS integration.
- Planned for next week
End of interface essential features and integration of automap most automated functions - ideally, the user will pick up the point layer containing data and press OK.
- Bottleneck(s)
The only bottleneck is wxWidgets handling, but not so much as one week ago.
Report #2.5, 11 June
First of all, the interface is almost finished.. The layout includes a notebook with a page for each R package (automap, gstat and geoR), with the available options. Another layout could be a chiocebox wht the three packages on the top of Kriging section, that redraws the section accourding to the user's choice. Let me know which one do you prefer, feel free also to suggest something new.
The R-GRASS integration is on its way: autofitting the variogram on a map obtained from a DEM sample works in the proof of concept. rpy2 is extremely helpful and does not require much more code than the original R code. More to come in the next days.
The idea for variogram fitting is to plot the variogram in a new frame and add some controls like sliders and/or text boxes to fill up with nugget, sill and range values. This will involve Python graphics, not R, as the former is more flexible.
Report #3, 12 June
this weekly report will be very short, as yesterday I sent this email (Report 2.5, see above) and made little progress.
I'm having some problems in using autoKrige() function work, AFAIK because of projection information handling.
Report #4, 19 June
this week I made steady progress on the module. What I've done:
- renamed the module v.krige, as its features will go beyond automatic kriging.
- addition of choicebox with only numerical columns, as interpolation will be based on such variables
- refactoring on interface population - more to come
- started documentation page
Next week I plan to create the splashscreen during dependencies check and data load, and examine how to integrate gstat functions.
Report #5, 26 June (at OSGeo Bolsena Hacking Event)
just few minutes ago I succeded in completing the kriging procedure with gstat functions. It runs with a proof-of-concept dataset based on spearfish data. Martin helped me a lot in getting the standard wxGUI comboboxes to run properly with filters.
Next week I plan to adapt the interface to each R package's available options and consider how to solve lag issues in populating interface.
The blocking issue about autoKrige() and projections is no more valid as I create the grid myself based on GRASS region.
Report #6, 3 July
this week I worked on removing parameters hardwired in the code, binding them to the interface instead. Therefore, now it is possible use all widgets in gstat and automap pages, i.e. pick up the model from a list, set sill, nugget and range; set the output raster map name and eventually overwrite it. I'm going to implement CLI in these next days, it will very likely need to reorganise the code and clearly split interface from model.
Blocking issues - none.
Report #7, midterm, 10 July
this week has been full of improvements:
- CLI with optional arguments is up and running
- interface creation is no more delayed by vector map filtering
- dependencies are checked all at the beginning - no risk of mid-process crashes if something is missing
- automap page merged with gstat: they share the same algorithms
- geoR page hidden, no implementation yet
- deep refactoring of the rpy code, moved into a separate controller class
- interface fills up all options except input data map - minimum interaction required (2 clicks)
next week I plan to implement the splashscreen and add further functionalities, hopefully different kriging techniques.
Blocking issues - none.
Report #8, 17 July
this week I added Command output tab and the option for block kriging, and updated documentation with more precise information on dependencies. I'll add splashscreen only as dependency check and command output will be fixed: next week I plan to work on these latter issues.
Nothing in particular is blocking progress, just some hardest stone sometimes.
Report #9, 24 July
this week I didn't work on the project because I attended last university course (botanical survey at Stelvio pass) and came back yesterday night.
Next week I plan to solve g.parser issue about double dependency check and the issue about sill, nugget and value parameters, that are optional for R. A general cleanup of the code is also welcome...
No blocking issues atm.
Report #10, 31 July
good news from v.krige:
- fixed optional parameters also in interface - activated by a checkbox, otherwise R's NA value is correctly set
- interactive variogram fit is on its way - stay tuned!
Next week I plan to set up interactive variogram using matplotlib functions and get the stabler code possible for feature freeze.
Report #11, 7 August
this week I worked on error handling by GUI, adding controls on input data and parameters and hiding Run and Plot buttons unless all options are suitable. Variogram plotting is on its way - I discarded matplotlib to avoid further dependencies, in favour of wx.lib.plot.
Report #12, 14 August
this is last SoC report - bringing variogram plotting into v.krige! After a long time searching for the lightest and cleanest implementation, R plotting raised as the best solution. On user demand, a R graphics window plots variogram and refreshes it, without interfering with wxGUI. Documentation includes a full example using Spearfish dataset and all informations about dependencies.
Many thanks to all who have helped me writing v.krige: GRASS and R developers, OSGeo SoC crew, wise friends, all testers. I hope this module will attract people towards GRASS and R, and provide a valid alternative to closed-source kriging tools.
References
- [0] Fowler, Beck, Brant, Opdyke, Roberts, 1999. "Refactoring. Improving the design of existing code" (ISBN-10: 0201485672; ISBN-13: 978-0201485677)
- [1] Isaaks and Srivastava, 1989: "An Introduction to Applied Geostatistics" (ISBN 0-19-505013-4)
- [2] http://www.geovariances.com/software/video-data-investigation-with-isatis-exploratory-data-analysis-ar0353.html
- [3] Jef Raskin, 2000: "The Humane Interface: New Directions for Designing Interactive Systems" (ISBN 0-201-37937-6)
- [4] OSGeo SoC mailing list http://lists.osgeo.org/pipermail/soc/
- [5] CRAN: Spatial view http://cran.r-project.org/web/views/Spatial.html
- [6] cran2deb repository https://stat.ethz.ch/pipermail/r-sig-debian/2009-July/000805.html