GitHub-Zenodo linkage

From GRASS-Wiki
Jump to navigation Jump to search

GitHub - Zenodo Integration for GRASS GIS

Scope

This wiki-page provides information about the linkage between the GRASS GIS repository on GitHub and the data archive Zenodo. This includes the motivation why this is relevant for the GRASS GIS community, hands on advice from the Zenodo helpdesk how to do the task and strategic information about future development paths, as this is partially work in progress (scientific grade automated software citation by persistent identifiers).

Introducing Zenodo

Zenodo is a general-purpose open-access repository. It is developed under the European OpenAIRE program (a network of Open Access repositories, archives and journals that support Open Access policies) and operated by CERN. The services provided by Zenodo are based on an Open Source software stack. To users, including the GRASS GIS developer team, Zenodo provides long term archiving of digital content, including software.

Benefits for the GRASS GIS community

Long term archiving

Zenodo stores the archived data (GRASS GIS codebase)in the CERN Data Center. Both data files and metadata are kept in multiple online and independent replicas. CERN has considerable knowledge and experience in building and operating large scale digital repositories and a commitment to maintain this data centre to collect and store 100s of PBs of LHC data as it grows over the next 20 years. In the highly unlikely event that Zenodo will have to close operations, it is guaranteed that all content will be migrated to other suitable repositories, and since all uploads have DOIs, all citations and links to Zenodo resources (GRASS GIS codebase) will not be affected. (Source: Zenodo FAQ)

Scientific citation by DOI

Currently, the GRASS GIS web presence (only) recommends citation by BibTEXBibTeX

All content archived in Zenodo receives a persistent identifier, a DOI (Digital Object Identifier). DOI can be used through webbrowsers. Unlike an URL (-> 404 error) they can by design never expire. The technical infrastructure for this is provided by the DataCite non-profit organisation. DOI are cited in scientific literature similar to ISBN or ISSN numbers.

A DOI-link resolves to a so called "landing page", a html page with metadata information (both human and machine readable) and a link to the archived digital content (e.g. the GRASS GIS codebase stored in Zenodo).

The "Zenodo page" for the GRASS GIS codebase is its DOI landing page.

Zenodo provides a service to render citation strings based on the landing page metadata in several hundreds of formatting styles, including Research Information Systems (RIS).

The GRASS GIS codebase has one "umbrella" DOI minted for it by Zenodo. This DOI is incremented for each GRASS GIS release by DOI versioning. This allows to cite both the GRASS GIS codebase in general (-> "umbrella DOI") and particular releases (by its DOI version).

Usage statistics

GitHub - Zenodo Integration

Generic Documentation

GRASS GIS specific Information

The following feedback was provided by the Zenodo helpdesk in Q1/Q2 2019 in advance of the GRASS github migration:

Many DOI for individual GRASS modules or rather a DOI for the GRASS GIS software framework ?

"Yes, in principle it's possible to issue DOIs for all modules, but I'm I don't think this is useful. You mention e.g. that it would be nice to cite both the overall system as well as individual modules. This will essentially "dilute" the citations over many DOIs and thus the citation count for GRASS will seem a lot lower than it actually is. Thus for getting credit, it's better to have one DOI per major version of the grass (where each version can have an updated author list). Also, having many DOIs makes it very difficult for discovery systems to track the citations automatically. Essentially Zenodo is the first system, where we can actually aggregate citations for all versions and a specific version of software."

"If you want to give credit to individual modules, I think it's better to then simply mention it. For instance, the journal text could mention it used module X, and the landing page of DOI for GRASS, could simply have a description detailing who did which module."

"We have now collected some 5000 citations to software in Zenodo, and what we can see for the top cited packages is that if the project provides a "citation recommendation" then people actually follow it. Example:"

How to integrate previous releases of GRASS GIS into Zenodo

Q: If possible, all GRASS releases should be made available for scientific citation via Zenodo (overview over all releases here:

The oldest GRASS release, which has been preserved as a tarball predates software versioning (GRASS 4.3. from 1999) All later releases are currently available in the current SVN repo ( https://trac.osgeo.org/grass/browser/grass/#branches)

So to be able to get a DOI for each major version of GRASS, as advised in your previous response, can you please recommend an approach how to feed the sequence of previous major GRASS releases via GitHub into Zenodo, so the DOI for the releases are in the proper sequence (DOI for GRASS 5.X referencing to the newer releases of GRASS6.x, 7.x, etc.) ? "

Answer: If you want to keep the history of releases in order, I would suggest you upload them in order of the release date (starting with the oldest one). Nevertheless, please notice that Zenodo sorts the releases by upload date and not by a version number. This means that any future release (e.g. minor versions of previous releases) will be simply the next one in the list by DateTime.... I hope I manage to explain it :)

Loose ends and options for future extensions

  • In scientific citations of the GRASS GIS project, the due credit is given to the "GRASS GIS developer team" as a group. The members of this group have changed over time as new members joined the group while others have departed. Also, persons take on different roles within the group which can also change over time. Some members chose to remain anonymous. There seems to be currently no mechanism to determine easily who were the members of the GRASS GIS developer team at a specific point in time.
  • The Zenodo account which contains the archived GRASS GIS software releases is associated with the GRASS GIS developer team. The respective DOI (versions) for all GRASS GIS releases archived in Zenodo are therefore "owned" by the developer team as a whole, but can't be claimed by with individual members of the development team (and their respective ORCID-IDs). This is an unresolved problem. Best practice examples are needed, to ensure that members of the development team can receive their due recognition.
  • Credit by software citation for members of the GRASS developer team should include recognition of the different roles (e.g. original author, code maintenance, porting efforts, documentation, bugfixing, etc. To some extent, classifications for relators developed by the Library of Congress could be used, as it is being done by the R community (Details). This has not been addressed in the context of GRASS GIS.
  • DOI can be bundled or tied together (which is already implemented in Zenodo. Best practices are needed for the GRASS community how to use this (e.g. linking DOI for scientific papers, code, data, documentation, videos, etc.)
  • If the GRASS or OSGeo communities should decide in the future to become DOI-minting entities (by signing a contract with DataCite), it is feasible to link already existing Zenodo-DOI for the GRASS GIS releases to "new" OSGeo (or GRASS GIS) DOI. From the DOI-perspectice, there is no "vendor lock-in" by using the Zenodo archive.