Agriculture and HPC
Introduction: Agricultural activity monitoring, e.g. enclosed quantifying the irrigation scheduling, tracing the soil hydraulic properties, generating the crop calendar etc., is necessary for efficient food security management at country level. The near real time monitoring or prediction on crop parameters such as crop growth in terms of planting date, date of emergence, extents, acreage, planting intensity, water stress, biomass, yield and etc. is important. It can contribute to better policymaking, timely countermeasures, optimization of water resources distributions, damage assessment and finally to supply food security and stable market. Researchers of agriculture also try to analyze various information about crops in order to take measures if they had some problems. Particularly, when an on-going experiment covers large area such as a country, satellite imagery plays a vital role by providing useful information over large/provincial area. However, some information, or crop parameters, is not visible through satellite images such as sowing dates, cropping intensity, growth, stress etc., which reflects a practical problem that we can not generate or observe those parameters from remote places. To collect those data, time by time basis field experiments are required. This is a time consuming, complex and expensive procedure. To overcome such problems, crop or simulation models can help. Usually, crop models calculate those missing information by analyzing the crop information with real fields' experiment data. Indirect methods such as inverse modeling with crop model can be used to obtain those basic input parameters. One such method is the manual calibration by "trial and error" procedure, which is very subjective and time consuming, and uncertainty associated with them can not be quantified. A more robust way of inverse modeling is to combine the model and assimilate data with optimization algorithm.
Thus, the challenge for the future is to obtain besides the more direct observable data (land cover, leaf area index, digital elevation model and evapotranspiration), non-visible data (soil characteristics, groundwater depth and irrigation practices) from satellite images.
Crop Assimilation Model (CAM): Crop models, Soil-Water-Air-Plant (SWAP) [1] or Decision Support System for Agrotechnology Transfer (DSSAT) [2], have capacity to simulate soil, water and crop processes and serve as crop productivity monitoring tool. Crop Assimilation Model (CAM) predicts parameters of crop models with satelite images. A new methodology was developed in [3], CAM-GA, to analyze the crop model (SWAP) parameters assimilation with Remote Sensing (RS) data and that parameters assimilation procedure was optimized by an evolutionary searching technique called Genetic Algorithm (GA). Due to the changing in input parameters of SWAP model for pixel-to-pixel basis, the assimilation search space is very large. Evolutionary search algorithms are performing well in such conditions. Similar works in [4] and [5] used some remotely sensed information combined with a binary GA [6] and SWAP model for optimizing soil hydraulic parameters. CAM with double layers GA, CAM-DLGA [7], uses directly visible multi-resolution RS images (ASTER [8], MODIS [9]) and inversely assimilates to SWAP model data for estimating the non-visible model parameters. One of the advantages of the GA is that they operate well even in domains, where little information is known. However, the main difficulty on GA is to decide an appropriate set of parameters, such as population size, number of generation, selection rate, crossover probability, and mutation probability. Other models, e.g., CAM-PSO [10] and CAM-PEST [11], use different evolutionary searching technique. They were developed and provided the similar functionality as CAM-GA. However, processing the agricultural information with CAM has a problem in practicality, that is, they require a huge amount of processing times. It becomes necessary to introduce methods for using higher processing power such as High Performance Computing (HPC) technologies.
Rationale: Multi computer based distributed systems (clusters and Grids) have a large processing capacity for a lower cost, naturally, choice turns towards developing HPC applications. However, it is not an easy job to port CAM in HPC environment. The application performance is significantly affected by the data distribution and task distribution methods on the HPC, and developers of agriculture or satellite image processing applications need to solve the problem of both data and task distribution, or how to distribute data and tasks among single or multiple clusters environment. The workload in HPC, the bandwidth, the processors speed, parameters of evaluation methods and data size are additional concerning factors. Moreover, interoperability between the agriculture application and existing RS image processing software is also necessary to improve practicality. However, users need to manually extract satellite data from some databases in the existing CAM works. Some researches separately focused on the agricultural models activities or satellite image processing performance on HPC and a few of them are in the merging domains with satellite image processing, agriculture and HPC. Still, the merging domain researches are mainly in hypothetical or conjectural theme rather than practical implementation. Thus, agricultural researchers require a web based system or tool for the agricultural activity monitoring so that they do not need to concern about the implementation issues for agricultural models or RS image processing on HPC.
Research Objectives: Thus, the main focus of the research is to propose a new software system to support agriculture activities using both Remote Sensing (RS) and HPC. Processing satellite images automatically through HPC, CAM HPC implementation with appropriate data and task distribution schemes, web portal system on HPC are the vital sectors for modeling the overall distributed agriculture monitoring scheme and require individual experiments. Therefore, the research question is "Does the HPC system help to improve the performance in each sector separately?". In this study, the performance in HPC system for each sector is successfully examined and experimented with different targeted applications. The applications are being implemented first time in HPC platform and their performance results are quite impressive and acceptable.
GRASS GIS (Geographical Resources Analysis Support System) is a free, open source software/tool and has been used for RS and GIS data analysis and visualization. GRASS module "r.vi" is used as a test example for experimenting on HPC (r.vi.mpi, r.vi.grid) [12]. Moreover, temporally splined procedure such as Local Maximum Fitting (LMF) processing methodology is improved to reduce the processing time by combining the parallelization of data and task distribution together on multi-cluster Grids[13]. CAM-GA, is implemented in grid testbed and the impact on the performance of parallelization methods is discussed. Additionally, a new CAM called CAM-PLDLGA is proposed and implemented[[14]]. A CAM-GA web interface with MapServer and PHPMapscripts is implemented. Thereafter, an implementation methodology with Python based Web Processing Service (PyWPS), Embrio Interface and MapServer are discussed for future extension.
Shamim Akhter, 18-9-2009
P.S: The complete research, Study on High Performance Computing of Crop Parameter Identification with The Grid, PhD dissertation, 2009, Information Processing,Tokyo Institute of Technology, will be sent upon mail request to Shamim Akhter (shamimakhter@gmail.com).