GitHub - Zenodo Integration for GRASS GIS
This wiki-page provides information about the linkage between the GRASS GIS repository on GitHub and the data archive Zenodo. This includes the motivation why this is relevant for the GRASS GIS community, hands on advice from the Zenodo helpdesk how to do the task and strategic information about future development paths, as this is partially work in progress (scientific grade automated software citation by persistent identifiers).
Zenodo is a general-purpose open-access repository for scientific information. It is developed under the European OpenAIRE program (a network of Open Access repositories, archives and journals that support Open Access policies) and operated by CERN. The services provided by Zenodo are based on an Open Source software stack. To users, including the GRASS GIS developer team, Zenodo provides long term archiving of digital content, including software. Every user can uplaod up to 50Gb of digital scientific content.
- Zenodo policies
- Zenodo Documentation: Software Deposit - Guidance for Researchers
- Software citations now available in Zenodo
Benefits for the GRASS GIS community
Long term archiving
Zenodo stores the archived data (GRASS GIS codebase)in the CERN Data Center. Both data files and metadata are kept in multiple online and independent replicas. CERN has considerable knowledge and experience in building and operating large scale digital repositories and a commitment to maintain this data centre to collect and store 100s of PBs of LHC data as it grows over the next 20 years. In the highly unlikely event that Zenodo will have to close operations, it is guaranteed that all content will be migrated to other suitable repositories, and since all uploads have DOIs, all citations and links to Zenodo resources (GRASS GIS codebase) will not be affected. (Source: Zenodo FAQ)
Scientific citation by DOI
Currently, the GRASS GIS web presence (only) recommends citation by BibTEXBibTeX
All content archived in Zenodo receives a persistent identifier, a DOI (Digital Object Identifier). DOI can be used through webbrowsers. Unlike an URL (-> 404 error) they can by design never expire. The technical infrastructure for this is provided by the DataCite non-profit organisation. DOI are cited in scientific literature similar to ISBN or ISSN numbers.
A DOI-link resolves to a so called "landing page", a html page with metadata information (both human and machine readable) and a link to the archived digital content (e.g. the GRASS GIS codebase stored in Zenodo).
The "Zenodo page" for the GRASS GIS codebase is its DOI landing page.
Zenodo provides a service to render citation strings based on the landing page metadata in several hundreds of formatting styles, including Research Information Systems (RIS).
The GRASS GIS codebase has one "umbrella" DOI minted for it by Zenodo. This DOI is incremented for each GRASS GIS release by DOI versioning. This allows to cite both the GRASS GIS codebase in general (-> "umbrella DOI") and particular releases (by its DOI version).
GitHub - Zenodo Integration
- GitHub Guide: Making Your Code Citable
- GenR-Blogpost: Make Your Code Citable Using GitHub and Zenodo: A How-to Guide
GRASS GIS specific information
The following feedback was provided by the Zenodo helpdesk in Q1/Q2 2019 in advance of the GRASS github migration:
Many DOI for individual GRASS modules or rather a DOI for the GRASS GIS software framework ?
"Yes, in principle it's possible to issue DOIs for all modules, but I'm I don't think this is useful. You mention e.g. that it would be nice to cite both the overall system as well as individual modules. This will essentially "dilute" the citations over many DOIs and thus the citation count for GRASS will seem a lot lower than it actually is. Thus for getting credit, it's better to have one DOI per major version of the grass (where each version can have an updated author list). Also, having many DOIs makes it very difficult for discovery systems to track the citations automatically. Essentially Zenodo is the first system, where we can actually aggregate citations for all versions and a specific version of software."
"If you want to give credit to individual modules, I think it's better to then simply mention it. For instance, the journal text could mention it used module X, and the landing page of DOI for GRASS, could simply have a description detailing who did which module."
"We have now collected some 5000 citations to software in Zenodo, and what we can see for the top cited packages is that if the project provides a "citation recommendation" then people actually follow it. Example:"
How to integrate previous releases of GRASS GIS into Zenodo
Q: If possible, all GRASS releases should be made available for scientific citation via Zenodo (overview over all releases here:
The oldest GRASS release, which has been preserved as a tarball predates software versioning (GRASS 4.3. from 1999) All later releases are currently available in the current SVN repo ( https://trac.osgeo.org/grass/browser/grass/#branches)
So to be able to get a DOI for each major version of GRASS, as advised in your previous response, can you please recommend an approach how to feed the sequence of previous major GRASS releases via GitHub into Zenodo, so the DOI for the releases are in the proper sequence (DOI for GRASS 5.X referencing to the newer releases of GRASS6.x, 7.x, etc.) ? "
Answer: If you want to keep the history of releases in order, I would suggest you upload them in order of the release date (starting with the oldest one). Nevertheless, please notice that Zenodo sorts the releases by upload date and not by a version number. This means that any future release (e.g. minor versions of previous releases) will be simply the next one in the list by DateTime.... I hope I manage to explain it :)
What happens if we screw this up ?
Answer: First, in case you make a mistake, we do have the possibility to reorder the releases manually. Naturally we would like to avoid this, however just rest assured that we can fix it if you make a mistake. If you want to use our GitHub integration, then you must move the source code to GitHub and activate the repository in Zenodo (see the GitHub guide). Afterwards, you make a new release in GitHub for each of your releases (see also the GitHub guide). You have to make the releases in the order you want them to appear in Zenodo. If you have tags push to GitHub, then you can upgrade a tag to a release in the GitHub interface .
Necessary steps to set up GitHub-Zenodo integration for GRASS GIS (according to https://guides.github.com/activities/citable-code/)
- Create a function address (email) for the Zenodo account (firstname.lastname@example.org ?)
- Log into Zenodo account using the credentials of the GitHub account of the GRASS Developer team (Log in button at the top right of the Zenodo page).
- Zenodo will redirect you back to GitHub to ask for your permission to share your email address (email@example.com ?) and the ability to configure webhooks on your repository. Go ahead and click Authorize application to give Zenodo the permissions it needs. Important! We need to archive a repository that belongs to an organization (OSGeo) on GitHub: Make sure that the organization administrator has enabled third-party access to the Zenodo application.
- At this point, you’ve authorized Zenodo to configure the repository webhooks needed to allow for archiving and DOI-issuing. To enable this functionality, simply click the On toggle button next to the GRASS repository.
- Check repository settings: By enabling archiving in Zenodo, you have set up a new webhook on your repository. Click the settings tab on your repository, and then click ‘Webhooks’ in the left-hand menu.
- Unless you’ve created releases (we have) for this repository before, you will be asked to Create a new release.
- (is this applicable for GRASS ?) If this is the first release of your code (not the case for GRASS) then you should give it a version number of v1.0.0. Fill in any release notes and click the Publish release button.
- Checking everything has worked: Creating a new release will trigger Zenodo into archiving your repository. You can confirm that this process took place by click the Upload tab in your Zenodo profile. You should see a new upload in the right-hand panel.
- Minting a DOI: Before Zenodo can issue a DOI for your repository, you will need to provide some information about the GitHub repo that you’ve just archived. Once you’re happy with the description of your software, click the Publish button at the bottom of the Zenodo form, and voilà, you’ve just made a new DOI for your GitHub repository!
- Back on your Zenodo GitHub page you should now see your repository listed with a shiny new badge showing your new DOI!
- ProTip: If you really want to show off, then right click on the gray and blue DOI image and copy the URL and place it in your README on your GitHub repo.
- Mint DOI versions for all existing GRASS releases which have been tagged within GitHub into Zenodo. This has to be done in the right sequence (oldest release first, latest release last)
Loose ends and options for future extensions
- Fine-grained citation of individual GRASS modules is provided by the GRASS GIS add-on module g.citation. The citation strings are generated from the man page of the module(s) to be cited. Since GRASS GIS module man pages are currently only referenced by URL, not DOI. Therefore all citation strings are based on URLs. Once a best practice has been identified to cite a specific GRASS module based on the Zenodo DOI of the GRASS GIS release, g.citation should be extended to include this capability. This would allow DOI-based citation of individual GRASS modules. The issue of adequate authorship information (DOI-for-a-GRASS-release: GRASS Developer Team, GRASS module: A number [1..n] of persons) needs to be discussed in this context.
- In scientific citations of the GRASS GIS project, the due credit is given to the "GRASS GIS developer team" as a group. The members of this group have changed over time as new members joined the group while others have departed. Also, persons take on different roles within the group which can also change over time. Some members chose to remain anonymous. There seems to be currently no mechanism to determine easily who were the members of the GRASS GIS developer team at a specific point in time and to link them to the DOI.
- The Zenodo account which contains the archived GRASS GIS software releases is associated with the GRASS GIS developer team. The respective DOI (versions) for all GRASS GIS releases archived in Zenodo are therefore "owned" by the developer team as a whole, but can't be claimed by with individual members of the development team (and their respective ORCID-IDs). This is an unresolved problem. Best practice examples are needed, to ensure that members of the development team can receive their due recognition.
- Credit by software citation for members of the GRASS developer team should include recognition of the different roles (e.g. original author, code maintenance, porting efforts, documentation, bugfixing, etc. To some extent, classifications for relators developed by the Library of Congress could be used, as it is being done by the R community (Details). This has not been addressed in the context of GRASS GIS.
- DOI can be bundled or tied together (which is already implemented in Zenodo. Best practices are needed for the GRASS community how to use this (e.g. linking DOI for scientific papers, code, data, documentation, videos, etc.)
- If the GRASS or OSGeo communities should decide in the future to become DOI-minting entities (by signing a contract with DataCite), it is feasible to link already existing Zenodo-DOI for the GRASS GIS releases to "new" OSGeo (or GRASS GIS) DOI. From the DOI-perspectice, there is no "vendor lock-in" by using the Zenodo archive.
- Develop best practices how to tie independent GRASS-add-on modules which are already archived in Zenodo independently into the GRASS GIS codebase Zenodo archive.