GRASS GSoC 2026 Parallelizing r.proj and Raster Processing Modules in GRASS

From GRASS-Wiki
Jump to navigation Jump to search
Student Name Kaushik Raja
Organization NumFOCUS
Mentor Name Huidae Cho, Anna Petrasova, Vaclav Petras
GitHub Fork View Repo
LinkedIn Profile View LinkedIn

Abstract

r.proj is one of the most commonly used modules in GRASS, reprojecting raster maps between coordinate systems. Despite running on modern multi-core hardware, it is entirely single-threaded, leaving most CPU cores idle and making large raster reprojections slower than necessary.

This project parallelizes r.proj using OpenMP. The main obstacles are GRASS's readcell tile cache, and the PROJ library. The readcell tile cache is not thread-safe and the PROJ library requires each thread to have its own context. The solution is a two-path memory architecture: a fast RAM buffer for maps that fit in memory, and a thread-local tile cache for larger maps. The project will also resolve issue #5776, which makes the progress reporting function G_percent unsafe in parallel code.

The same RAM preload pattern applies directly to r.param.scale, a terrain analysis module whose sequential sliding buffer currently prevents row-level parallelism. r.geomorphon will also be parallelized as a third module. Together, these changes establish a reusable parallelization framework for future GRASS contributors.

Project Scope

  1. Implement the two-path memory architecture for r.proj: RAM buffer for maps within the memory threshold, thread-local tile caches for larger maps
  2. Implement per-thread PJ_CONTEXT initialization for PROJ library thread safety
  3. Fix issue #5776: replace unsafe G_percent calls in parallel code with atomic counter and master-thread-only progress reporting
  4. Add a user-controlled memory threshold parameter
  5. Parallelize r.param.scale by replacing the sequential sliding buffer with a RAM preload pattern
  6. Parallelize r.geomorphon as a third module
  7. Write pixel parity regression tests and scalability benchmarks across multiple core counts
  8. Document all parallelized modules

Timeline

Period Timeline Tasks Status
Community Bonding Period May 1 - May 25
  1. Thread-safety audit of gprojects library calls
  2. Study readcell.c internals
  3. Set up benchmarking infrastructure
  4. Source audit of remaining modules
  5. Finalize Linux dev environment
  6. Confirm benchmarks are reproducible
  7. Agree on final implementation details with mentors
Official Coding Period May 25 - June 8
  1. Benchmark Path A vs Path B
  2. Wire HAVE_OPENMP into configure system
  3. Implement per-thread PJ_CONTEXT
  4. Implement user-controlled memory threshold
June 9 - June 22
  1. Indexed row output buffer
  2. Fix G_percent thread safety (issue #5776)
  3. Pixel parity regression tests for nearest-neighbor
June 23 - July 6
  1. Thread-local cache allocation
  2. Two-path decision logic
  3. r.proj complete
July 7 - July 11
  1. Submit midterm evaluation
July 7 - July 20
  1. Test Path B on large maps
  2. Parallelize bilinear, bicubic, lanczos kernels
  3. Scalability benchmarks (1/2/4/8/16 cores)
July 21 - August 3
  1. Production-quality r.param.scale
  2. Regression tests for all method types
August 4 - August 11
  1. r.geomorphon parallelization
August 12 - August 18
  1. Pre-commit fixes, manual pages, final PR polish
  2. Blog post for OSGeo Planet
Final Week August 19 - August 26
  1. Submit final evaluation

Reports

Community Bonding Period

Coding Period

Log of Pull Requests

  • PR #7185 - r.proj OpenMP parallelization proof of concept
  • PR #7236 - r.param.scale parallelization proof of concept