GRASS GSoC 2024 Add JSON output: Difference between revisions

From GRASS-Wiki
Jump to navigation Jump to search
No edit summary
(Grass -> GRASS)
 
(27 intermediate revisions by 4 users not shown)
Line 4: Line 4:
|Student Name: || [[User:juno|Kriti Birda]]
|Student Name: || [[User:juno|Kriti Birda]]
|-
|-
|Organization: || [http://www.osgeo.org OSGeo - Open Source Geospatial Foundation]
|Organization: || [https://www.osgeo.org OSGeo - Open Source Geospatial Foundation]
|-
|-
| Mentor Name: || Corey White and Vaclav Petras
| Mentor Name: || Corey White and Vaclav Petras
Line 13: Line 13:


;Abstract
;Abstract
::At the moment, most of the tools in Grass have custom human readable outputs in plain text. Some of these modules could benefit from storing their output in a portable and commonly used data format. The aim of my project is to use the parson library in various tools so that they can produce JSON outputs. The addition of JSON as an output format will be accompanied with addition of Python test cases to verify the output works as intended and to avoid regressions in future. An option to specify the desired output format (plain or JSON) will also be added to each of the tools updated. The layout of the JSON format will be discussed with mentors prior to implementation and will be optimized towards easy ingestion with Pandas.
::At the moment, most of the tools in GRASS have custom human readable outputs in plain text. Some of these modules could benefit from storing their output in a portable and commonly used data format. The aim of my project is to use the parson library in various tools so that they can produce JSON outputs. The addition of JSON as an output format will be accompanied with addition of Python test cases to verify the output works as intended and to avoid regressions in future. An option to specify the desired output format (plain or JSON) will also be added to each of the tools updated. The layout of the JSON format will be discussed with mentors prior to implementation and will be optimized towards easy ingestion with Pandas.
 
__TOC__
__TOC__
== Application ==
 
== Pull Requests ==
{| class="wikitable"
|+
|-
! Module !! PR Title !! PR Link !! Status at end of GSoC Period
|-
| lib || add standard parser option for JSON formatting || https://github.com/OSGeo/grass/pull/3704/ || merged
|-
| r.info || add JSON output || https://github.com/OSGeo/grass/pull/3744/ || merged
|-
| v.info || add JSON output || https://github.com/OSGeo/grass/pull/3755/ || merged
|-
| r.univar || add JSON output || https://github.com/OSGeo/grass/pull/3783/ || merged
|-
| v.univar || add JSON output || https://github.com/OSGeo/grass/pull/3784/ || merged
|-
| r.profile || add JSON output || https://github.com/OSGeo/grass/pull/3872/ || merged
|-
| r.stats || add JSON output || https://github.com/OSGeo/grass/pull/3884/ || open
|-
| r.report || add JSON output || https://github.com/OSGeo/grass/pull/3935/ || merged
|-
| g.region || add JSON output || https://github.com/OSGeo/grass/pull/3941/ || merged
|-
| v.distance || add JSON output || https://github.com/OSGeo/grass/pull/3942/ || open
|-
| r.category || add JSON output || https://github.com/OSGeo/grass/pull/4018/ || merged
|-
| v.category || add JSON output || https://github.com/OSGeo/grass/pull/4020/ || open
|-
| db.describe || add JSON output || https://github.com/OSGeo/grass/pull/4021/ || merged
|-
| v.to.db || add JSON output  || https://github.com/OSGeo/grass/pull/4036/ || open
|-
| g.proj || add JSON output  || https://github.com/OSGeo/grass/pull/4104/ || open
|-
| r.object.geometry || add JSON output ||  https://github.com/OSGeo/grass/pull/4105/ || merged
|-
| g.region || fix ruff lint error in tests || https://github.com/OSGeo/grass/pull/4167/ || merged
|-
| db.describe || fix illegal memory access report || https://github.com/OSGeo/grass/pull/4202/ || merged
|}
 
== Reports ==
 
Introduction: https://discourse.osgeo.org/t/gsoc-2024-introduction-juno/28253
 
Community Bonding Period: https://discourse.osgeo.org/t/gsoc-2024-week-0-report-add-json-support-to-grass-modules/28299
 
# Week 1: https://discourse.osgeo.org/t/gsoc-2024-week-1-report-add-json-support-to-grass-modules/30673
# Week 2: https://discourse.osgeo.org/t/gsoc-2024-week-2-report-add-json-support-to-grass-modules/30764
# Week 3: https://discourse.osgeo.org/t/gsoc-2024-week-3-report-add-json-support-to-grass-modules/30791
# Week 4: https://discourse.osgeo.org/t/gsoc-2024-week-4-report-add-json-support-to-grass-modules/30834
# Week 5: https://discourse.osgeo.org/t/gsoc-2024-week-5-report-add-json-support-to-grass-modules/30882
# Week 6: https://discourse.osgeo.org/t/gsoc-2024-week-6-report-add-json-support-to-grass-modules/30906
# Week 7: https://discourse.osgeo.org/t/gsoc-2024-week-7-report-add-json-support-to-grass-modules/30946
# Week 8: https://discourse.osgeo.org/t/gsoc-2024-week-8-report-add-json-support-to-grass-modules/30993
# Week 9: https://discourse.osgeo.org/t/gsoc-2024-week-9-report-add-json-support-to-grass-modules/31007
# Week 10: https://discourse.osgeo.org/t/gsoc-2024-week-10-report-add-json-support-to-grass-modules/49643
# Week 12: https://discourse.osgeo.org/t/gsoc-2024-week-12-report-add-json-support-to-grass-modules/49735
 
Final: https://discourse.osgeo.org/t/gsoc-2024-final-report-add-json-output-to-different-tools-in-c/49784
{{GSoC}}
 
== Final Report ==
 
=== The State of the Art Before GSoC ===
Before this project, the majority of GRASS GIS tools produced outputs in plain text, which required manual parsing or conversion to be used in other software systems. Some modules already had JSON support, but the implementation was inconsistent, using different flags or options. This made it challenging to automate tasks or integrate GRASS GIS outputs directly with modern data processing pipelines.
 
=== The Addition (Added Value) That My Project Brought to the Software ===
The project brought significant improvements by adding JSON output support to 16 GRASS GIS tools. This enhancement allows users to specify their desired output format (plain text or JSON), making it easier to integrate with data analysis tools and workflows. Additionally, the project standardized the options for tools that already had JSON support, improving consistency across the platform. The introduction of comprehensive Python test cases for these outputs ensures that the enhancements are reliable and future-proof.
 
=== Potential Future Work ===
JSON support for 4 modules is currently a work in progress and should hopefully be complete soon. Further work is needed to extend JSON output support to the remaining tools within GRASS GIS. Future developers can build on this foundation, focusing on additional modules or enhancing the JSON schema to support more complex use cases.
 
== Examples ==
 
=== Using r.category JSON output with Python ===
<source lang="python">
import grass.script as gs
output = gs.read_command(
    "r.category",
    map="towns",
    output_format="json"
)
categories = json.loads(output)
print(categories)
</source>
 
<pre>
[
    {
        "category": 1,
        "description": "CARY"
    },
    {
        "category": 2,
        "description": "GARNER"
    },
    {
        "category": 3,
        "description": "APEX"
    },
    {
        "category": 4,
        "description": "RALEIGH-CITY"
    },
    {
        "category": 5,
        "description": "RALEIGH-SOUTH"
    },
    {
        "category": 6,
        "description": "RALEIGH-WEST"
    }
]
</pre>
 
=== Using r.profile JSON output with pandas and Matplotlib ===
<source lang="python">
import grass.script as gs
import pandas as pd
import matplotlib.pyplot as plt
 
# Run r.profile command
elevation = gs.read_command(
    "r.profile",
    input="elevation",
    coordinates="641712,226095,641546,224138,641546,222048,641049,221186",
    format="json",
    flags="gc"
)
 
df = pd.read_json(elevation)
print(df)
 
# Convert the RGB color values to hex format for matplotlib
df["color"] = df.apply(lambda x: "#{:02x}{:02x}{:02x}".format(int(x["red"]), int(x["green"]), int(x["blue"])), axis=1)
 
# Create the scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df['distance'], df['elevation'], c=df['color'], marker='o')
plt.title('Profile of Distance vs. Elevation with Color Coding')
plt.xlabel('Distance (meters)')
plt.ylabel('Elevation')
plt.grid(True)
plt.show()
</source>
<pre>
          easting      northing    distance  elevation  red  green  blue    color
0  641712.000000  226095.000000    0.000000  84.530815  111    255    0  #6fff00
1  641669.739905  225596.789117  500.000000  97.633720  255    244    0  #fff400
2  641627.479809  225098.578233  1000.000000  104.868874  255    198    0  #ffc600
3  641585.219714  224600.367350  1500.000000  97.171303  255    247    0  #fff700
4  641546.000000  224138.000000  1964.027749  81.972504  79    255    0  #4fff00
5  641546.000000  223638.000000  2464.027749  72.764458    0    245    29  #00f51d
6  641546.000000  223138.000000  2964.027749  80.820168  64    255    0  #40ff00
7  641546.000000  222638.000000  3464.027749  71.326347    0    241    42  #00f12a
8  641546.000000  222138.000000  3964.027749  71.669518    0    242    39  #00f227
9  641546.000000  222048.000000  4054.027749  71.669518    0    242    39  #00f227
10  641296.254788  221614.840296  4554.027749  78.522743  35    255    0  #23ff00
</pre>
 
[[File:Distance Elevation Color Coded.png]]
 
=== Using r.info with pandas with session setup ===
 
<source lang="python">
import subprocess
import sys
 
sys.path.append(subprocess.check_output(["grass", "--config", "python_path"], text=True).strip())
 
import grass.script as gs
 
gs.setup.init("~/grassdata/nc_spm_08_grass7/")
</source>
 
<source lang="python">
data = gs.parse_command("r.info", map="elevation", format="json")
print(data["cells"])
</source>
 
<pre>
2025000
</pre>
 
<source lang="python">
print(pd.DataFrame([data]).T)
</source>
 
<pre>
...
north                                                      228500
south                                                      215000
nsres                                                          10
east                                                      645000
west                                                      630000
ewres                                                          10
...
</pre>
 
=== Using r.info JSON output with jq in interactive shell ===
 
<source lang="bash">
r.info lakes format=json | jq '.title'
</source>
<pre>
"South-West Wake county: Wake county lakes"
</pre>
 
<source lang="bash">
r.info elevation format=json | jq .min,.max
</source>
 
<pre>
55.578792572021484
156.32986450195312
</pre>
 
=== Using r.info with jq in command line ===
 
<source lang="bash">
$ grass-dev --tmp-mapset "~/grassdata/nc_spm_08_grass7/" --exec r.info map=elevation format=json | jq '.title'
</source>
 
<pre>
"South-West Wake county: Elevation NED 10m"
</pre>

Latest revision as of 20:31, 22 August 2024

Accepted Google Summer of Code 2024 project.

Student Name: Kriti Birda
Organization: OSGeo - Open Source Geospatial Foundation
Mentor Name: Corey White and Vaclav Petras
Title: Add JSON output to different GRASS tools in C
Abstract
At the moment, most of the tools in GRASS have custom human readable outputs in plain text. Some of these modules could benefit from storing their output in a portable and commonly used data format. The aim of my project is to use the parson library in various tools so that they can produce JSON outputs. The addition of JSON as an output format will be accompanied with addition of Python test cases to verify the output works as intended and to avoid regressions in future. An option to specify the desired output format (plain or JSON) will also be added to each of the tools updated. The layout of the JSON format will be discussed with mentors prior to implementation and will be optimized towards easy ingestion with Pandas.

Pull Requests

Module PR Title PR Link Status at end of GSoC Period
lib add standard parser option for JSON formatting https://github.com/OSGeo/grass/pull/3704/ merged
r.info add JSON output https://github.com/OSGeo/grass/pull/3744/ merged
v.info add JSON output https://github.com/OSGeo/grass/pull/3755/ merged
r.univar add JSON output https://github.com/OSGeo/grass/pull/3783/ merged
v.univar add JSON output https://github.com/OSGeo/grass/pull/3784/ merged
r.profile add JSON output https://github.com/OSGeo/grass/pull/3872/ merged
r.stats add JSON output https://github.com/OSGeo/grass/pull/3884/ open
r.report add JSON output https://github.com/OSGeo/grass/pull/3935/ merged
g.region add JSON output https://github.com/OSGeo/grass/pull/3941/ merged
v.distance add JSON output https://github.com/OSGeo/grass/pull/3942/ open
r.category add JSON output https://github.com/OSGeo/grass/pull/4018/ merged
v.category add JSON output https://github.com/OSGeo/grass/pull/4020/ open
db.describe add JSON output https://github.com/OSGeo/grass/pull/4021/ merged
v.to.db add JSON output https://github.com/OSGeo/grass/pull/4036/ open
g.proj add JSON output https://github.com/OSGeo/grass/pull/4104/ open
r.object.geometry add JSON output https://github.com/OSGeo/grass/pull/4105/ merged
g.region fix ruff lint error in tests https://github.com/OSGeo/grass/pull/4167/ merged
db.describe fix illegal memory access report https://github.com/OSGeo/grass/pull/4202/ merged

Reports

Introduction: https://discourse.osgeo.org/t/gsoc-2024-introduction-juno/28253

Community Bonding Period: https://discourse.osgeo.org/t/gsoc-2024-week-0-report-add-json-support-to-grass-modules/28299

  1. Week 1: https://discourse.osgeo.org/t/gsoc-2024-week-1-report-add-json-support-to-grass-modules/30673
  2. Week 2: https://discourse.osgeo.org/t/gsoc-2024-week-2-report-add-json-support-to-grass-modules/30764
  3. Week 3: https://discourse.osgeo.org/t/gsoc-2024-week-3-report-add-json-support-to-grass-modules/30791
  4. Week 4: https://discourse.osgeo.org/t/gsoc-2024-week-4-report-add-json-support-to-grass-modules/30834
  5. Week 5: https://discourse.osgeo.org/t/gsoc-2024-week-5-report-add-json-support-to-grass-modules/30882
  6. Week 6: https://discourse.osgeo.org/t/gsoc-2024-week-6-report-add-json-support-to-grass-modules/30906
  7. Week 7: https://discourse.osgeo.org/t/gsoc-2024-week-7-report-add-json-support-to-grass-modules/30946
  8. Week 8: https://discourse.osgeo.org/t/gsoc-2024-week-8-report-add-json-support-to-grass-modules/30993
  9. Week 9: https://discourse.osgeo.org/t/gsoc-2024-week-9-report-add-json-support-to-grass-modules/31007
  10. Week 10: https://discourse.osgeo.org/t/gsoc-2024-week-10-report-add-json-support-to-grass-modules/49643
  11. Week 12: https://discourse.osgeo.org/t/gsoc-2024-week-12-report-add-json-support-to-grass-modules/49735

Final: https://discourse.osgeo.org/t/gsoc-2024-final-report-add-json-output-to-different-tools-in-c/49784

Final Report

The State of the Art Before GSoC

Before this project, the majority of GRASS GIS tools produced outputs in plain text, which required manual parsing or conversion to be used in other software systems. Some modules already had JSON support, but the implementation was inconsistent, using different flags or options. This made it challenging to automate tasks or integrate GRASS GIS outputs directly with modern data processing pipelines.

The Addition (Added Value) That My Project Brought to the Software

The project brought significant improvements by adding JSON output support to 16 GRASS GIS tools. This enhancement allows users to specify their desired output format (plain text or JSON), making it easier to integrate with data analysis tools and workflows. Additionally, the project standardized the options for tools that already had JSON support, improving consistency across the platform. The introduction of comprehensive Python test cases for these outputs ensures that the enhancements are reliable and future-proof.

Potential Future Work

JSON support for 4 modules is currently a work in progress and should hopefully be complete soon. Further work is needed to extend JSON output support to the remaining tools within GRASS GIS. Future developers can build on this foundation, focusing on additional modules or enhancing the JSON schema to support more complex use cases.

Examples

Using r.category JSON output with Python

import grass.script as gs
output = gs.read_command(
    "r.category",
    map="towns",
    output_format="json"
)
categories = json.loads(output)
print(categories)
[
    {
        "category": 1,
        "description": "CARY"
    },
    {
        "category": 2,
        "description": "GARNER"
    },
    {
        "category": 3,
        "description": "APEX"
    },
    {
        "category": 4,
        "description": "RALEIGH-CITY"
    },
    {
        "category": 5,
        "description": "RALEIGH-SOUTH"
    },
    {
        "category": 6,
        "description": "RALEIGH-WEST"
    }
]

Using r.profile JSON output with pandas and Matplotlib

import grass.script as gs
import pandas as pd
import matplotlib.pyplot as plt

# Run r.profile command
elevation = gs.read_command(
    "r.profile",
    input="elevation",
    coordinates="641712,226095,641546,224138,641546,222048,641049,221186",
    format="json",
    flags="gc"
)

df = pd.read_json(elevation)
print(df)

# Convert the RGB color values to hex format for matplotlib
df["color"] = df.apply(lambda x: "#{:02x}{:02x}{:02x}".format(int(x["red"]), int(x["green"]), int(x["blue"])), axis=1)

# Create the scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df['distance'], df['elevation'], c=df['color'], marker='o')
plt.title('Profile of Distance vs. Elevation with Color Coding')
plt.xlabel('Distance (meters)')
plt.ylabel('Elevation')
plt.grid(True)
plt.show()
          easting       northing     distance   elevation  red  green  blue    color
0   641712.000000  226095.000000     0.000000   84.530815  111    255     0  #6fff00
1   641669.739905  225596.789117   500.000000   97.633720  255    244     0  #fff400
2   641627.479809  225098.578233  1000.000000  104.868874  255    198     0  #ffc600
3   641585.219714  224600.367350  1500.000000   97.171303  255    247     0  #fff700
4   641546.000000  224138.000000  1964.027749   81.972504   79    255     0  #4fff00
5   641546.000000  223638.000000  2464.027749   72.764458    0    245    29  #00f51d
6   641546.000000  223138.000000  2964.027749   80.820168   64    255     0  #40ff00
7   641546.000000  222638.000000  3464.027749   71.326347    0    241    42  #00f12a
8   641546.000000  222138.000000  3964.027749   71.669518    0    242    39  #00f227
9   641546.000000  222048.000000  4054.027749   71.669518    0    242    39  #00f227
10  641296.254788  221614.840296  4554.027749   78.522743   35    255     0  #23ff00

Using r.info with pandas with session setup

import subprocess
import sys

sys.path.append(subprocess.check_output(["grass", "--config", "python_path"], text=True).strip())

import grass.script as gs

gs.setup.init("~/grassdata/nc_spm_08_grass7/")
data = gs.parse_command("r.info", map="elevation", format="json")
print(data["cells"])
2025000
print(pd.DataFrame([data]).T)
...
north                                                      228500
south                                                      215000
nsres                                                          10
east                                                       645000
west                                                       630000
ewres                                                          10
...

Using r.info JSON output with jq in interactive shell

r.info lakes format=json | jq '.title'
"South-West Wake county: Wake county lakes"
r.info elevation format=json | jq .min,.max
55.578792572021484
156.32986450195312

Using r.info with jq in command line

$ grass-dev --tmp-mapset "~/grassdata/nc_spm_08_grass7/" --exec r.info map=elevation format=json | jq '.title'
"South-West Wake county: Elevation NED 10m"