GRASS mailing list community evolution: Difference between revisions

From GRASS-Wiki
Jump to navigation Jump to search
(New page: =Watching how grass-dev develops (and grass-user is used)= During the 10th GRASS GFOSS User meeting in Cagliari, Italy, a summary of the activities of the Italian GFOSS community was pres...)
 
(more GRASS GIS project history added)
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=Watching how grass-dev develops (and grass-user is used)=
=Watching how grass-dev develops (and grass-user is used)=


During the 10th GRASS GFOSS User meeting in Cagliari, Italy, a summary
''DRAFT - work in progress'' - by A Giacomelli and M Neteler
 
==Introduction==
 
During the [http://gfoss2009.crs4.it/ 10th GRASS GFOSS User meeting] in Cagliari, Italy, a summary
of the activities of the Italian GFOSS community was presented.
of the activities of the Italian GFOSS community was presented.
Together with basic indicators on the activity of the Italian
Together with basic indicators on the activity of the Italian
community, some simple yet intriguing statistics, derived from an
community, some simple yet intriguing statistics, derived from an
analysis of the main discussion mailing lists were shown.
analysis of the main discussion mailing lists were shown.
(DARE DUE esempi SU QUESTO).
''(DARE DUE esempi SU QUESTO).''


In the typical brainstorming atmosphere which permeates events such as
In the typical brainstorming atmosphere which permits events such as
software user meetings, we considered the idea of replicating the same analysis
software user meetings, we considered the idea of replicating the same analysis
on two other mailing lists with a much longer history, namely the
on two other mailing lists with a much longer history, namely the
grass developer and the grass user mailing lists.
grass developer ([http://lists.osgeo.org/pipermail/grass-dev/ grass-dev]) and the grass user ([http://lists.osgeo.org/pipermail/grass-user/ grass-user]) mailing lists.


The outcome of the analysis provides a unique insight on the dynamics
The outcome of the analysis provides a unique insight on the dynamics
Line 17: Line 21:
span, from 1991 through 2008.
span, from 1991 through 2008.


SHORT STORY ABOUT LONG CONVERSATION  (uh, could be better)
==How source data was collected==


The story of the creation of a seventeen-year long archive of communications deserves some description, as it is representative of the effort spent in maintaining a historical record of the communications within developers (and users) through various phases of the GRASS project.


- US Army mailing lists launch 12/1991
SHORT STORY ABOUT LONG CONVERSATION  (uh, could be better title)
- interfaced with deja news (http://en.wikipedia.org/wiki/Deja_News)
 
in (check MN)
Members of the US Corps of Engineers (US Army CERL) launched the grass-user and grass-dev "GRASShopper" mailing lists in December 1991 ("Opening Night" in [http://lists.osgeo.org/pipermail/grass-dev/1991-December/000002.html grass-dev] and [http://lists.osgeo.org/pipermail/grass-user/1991-December/013383.html grass-user]). And the first user message came from Politecnico di Milano in Italy... Only the following year the [http://lists.osgeo.org/pipermail/grass-dev/1992-January/000031.html comp.infosystems.gis] Usenet group was born, followed with a GRASS mailing lists-Usenet interface in [http://lists.osgeo.org/pipermail/grass-dev/1992-December/000584.html December 1992].
- Deja_News forum only (dovrei verificare ma ho gli mbox files delle liste,
 
si fa preso con "mutt")
* new mailing lists born in 1999 at University of Hannover
- 1995 (?) email spam nasce in Dejanews
* in 2001 lists migrated to Italy with MN and server migration  
(http://en.wikipedia.org/wiki/E-mail_spam),
* missing emails recovered from dejanews and merged into original lists mbox files (which MN received from US Army, don't remember precisely)
carefully later polished manually from the list
* All email headers for many years had to be reconstructed since their format was broken.
- new mailing lists born in 1999 (check MN) at University of Hannover as
* complete archive restored and online ''(check date MN)''
dejanews wasn't usable and pratical
* in 2007, lists migrated to OSGeo infrastructure
- lists migrated to Italy with MN and server migration in 2001
- email recovered from dejanews and merged into original lists mbox files (which
MN received from US Army, don't remember precisely)
[we need to be vague about this because perhaps the msg copyright was with
dejanews when using their system. dejanews was then bought by Google].
All email headers for many years had to be reconstructed since the format
was broken.
- complete archive restored and online (check date MN)
- in 2007, lists migrated to OSGeo infrastructure


==Analysis Methodology==
==Analysis Methodology==
Line 47: Line 43:
tools.
tools.


==TIME AND SPACE==
==Time and space==


The first core set of information extracted was the time zone
The first core set of information extracted was the time zone
reference of the messages, considering tha time zone may be used to
reference of the messages, considering that time zone may be used to
provide an approximate indication of longitude.
provide an approximate indication of longitude.
One of the drivers for our analysis was also to verify if/how the mailing lists provided an evidence of the shift of development activity from the initial US-based model to Europe, rather than providing a detailed spatial distribution of the developers or the users.
This insured that simply considering the time zone reference  would be an adequate proxy of location for the source of a given message.


For the grass-dev list, the results we obtained from a first pass with
For the grass-dev list, the results we obtained from a first pass with
the tools is able to parse correctly over 99% percent of the messages.
the scripts developed was able to parse correctly over 99% percent of the messages.
It may be possible to obtain a greater completeness by refining the
It may be possible to obtain a greater completeness by refining the
parsing algorithm to handle exceptions encountered in the parsing
parsing algorithm to handle exceptions encountered in the  
process, but we considered  the level of approximation obtained in the
process, but we considered  the level of approximation obtained in the
extraction of the time zone reference to be adequate for the quality
extraction of the time zone reference to be adequate for the quality
Line 65: Line 64:




==WHAT DO TIME AND TIME ZONES TELL==
==What do time and time zones tell==


The charts (include numbers) show:
The charts (include numbers) show:


* absolute number of message postings by time zone and year
* absolute number of message postings by time zone and year (Figures 1 and 4, respectively for GRASS-dev and GRASS-user)
* the relative proportion of messages posted each year from a given time zone.
* the relative proportion of messages posted each year from a given time zone (Figures 2 and 5,  respectively for GRASS-dev and GRASS-user)
* the cumulated proportion of messages deriving from different time
* the cumulated proportion of messages deriving from different time zones, calculated assuming 100% to be the e-mail traffic generated from the beginning of the mailing list records through 2008 (Figures 3 and 6,  respectively for GRASS-dev and GRASS-user)
zones, calculated assuming 100% to be the e-mail traffic generated
from the beginning of the mailing list records through 2008.


* Figure 7: local time of posting on grass-dev: shows that most of the communication is done on business hours (and some in the evening)
  (mettere qui vari spunti)
  (mettere qui vari spunti)




[[File:1grassdev-post-by-tz.jpg|450px|thumb|center|Number of messages by time zone]]
[[File:2grassdev-relproportion-by-tz.jpg|450px|thumb|center|Relative proportion of messages per year]]
[[File:3grassdev-cumfreq.jpg|450px|thumb|center|Cumulated percent of messages]]
[[File:4grassuser-post-by-tz.jpg|450px|thumb|center|grass-user: number of messages per time zone]]
[[File:5grassuser-relprop-by-tz.jpg|450px|thumb|center|grass-user: relative proportion of messages per time zone]]
[[File:6grassuser-cumfreq.jpg|450px|thumb|center|grass-user: cumulated percentage of messages]]
[[File:7grassdev-localtime.jpg|450px|thumb|center|grass-dev: local time of message posting by year]]
==...and what about the contents ?==
Another interesting analysis is represented by the text extraction of specific keywords from the message body. While it can be extremely intriguing to build dictionaries of words and expressions used within a mailing list, in the case of the GRASS lists, we decided to focus on GRASS commands. Matrices with the occurrence of GRASS commands by year were generated for both mailing lists.
The clear limitation in this type of analysis is that the use of a term is not associated to context. Reference to a specific command may not indicate if this is associated to a coding problem, to issues in use, or to working examples.
Another element which is neglected in the analysis is quotation: i.e. the occurrence of a term is counted as long as it appears in the body of a message.
At the same time, we think that even a preliminary analysis does provide extremely interesting insight on the bulk of mailing list traffic.
The review of the entries reported by the parser (Figures xx and yy) (DOVE CI PORTA ?)
[[File:8grassdev-commands-in-time.jpg|450px|thumb|center|grass-dev: mentioning of commands by year]]
[[File:9grassuser-commands-in-time.jpg|450px|thumb|center|grass-user: occurrence in time of top20 most mentioned commands]]
yay... La cosa deve un po' crescere. (AL LIMITE CI LIMITIAMO A SPIEGARE CHE SIAMO CONTENTI DI AVER FATTO UNA PRIMA ESTRAZIONE...)


Another interesting analysis is represented by the text extraction of
Reference for details beyond the top 20 entries: worksheets 5 and 6 of ODS file
specific keywords from the message body - in the case of the GRASS
lists, we decided to focus on GRASS commands.
Matrices with the occurrence of GRASS commands by year were generated
for both mailing lists.


The clear limitation in this type of analysis is that the use of a
== Trivia ==
term is not associated to context. Reference to a specific command may
not indicate if this is associated to a coding problem, to issues in
use, or to working examples.


Another element which is neglected in the analysis is quotation: i.e.
Already in early days GRASS developers were concerned about copyright issues and introduction of [http://lists.osgeo.org/pipermail/grass-dev/1992-March/000155.html non-free code] in GRASS. The GRASS FTP site was changed to a 24hs service in [http://lists.osgeo.org/pipermail/grass-dev/1992-September/000490.html September 1992]. We remember that there was no WWW at all in those days. First GRASS Linux binaries where announced in [http://lists.osgeo.org/pipermail/grass-dev/1994-February/001430.html February 1994]. A first European GRASS FTP mirror site appeared in [http://lists.osgeo.org/pipermail/grass-dev/1995-January/001962.html January 1995]. The first internet viruses were discussed in [http://lists.osgeo.org/pipermail/grass-user/1995-April/023391.html April 1995] and first spam reached the lists in [http://lists.osgeo.org/pipermail/grass-dev/1995-October/date.html October 1995] (most spam was later manually removed from the list archives by MN). With CERL winding down the development in 1996 due to a governmental decision, the traffic in the developers ist went down to a minimum.
the occurrence of a term is counted as long as it appears in the body
The GRASS project returned to real activity in 1997. A first attempt to get GRASS into Debian in [http://lists.osgeo.org/pipermail/grass-dev/1998-March/002357.html March 1998]. The new GRASS 4.2.1 package, the first release after 4.1 from CERL became available in [http://lists.osgeo.org/pipermail/grass-dev/1998-January/002335.html January 1998]. The license change to GPL was discussed in [http://lists.osgeo.org/pipermail/grass-dev/1999-October/date.html October 1999]. GRASS 5 beta releases came out in [http://lists.osgeo.org/pipermail/grass-dev/1999-October/012816.html October 1999]. In 2006 a GRASS Wiki was started (originally TWIKI at http://grass.gdf-hannover.de/wiki/, then [https://lists.osgeo.org/pipermail/grass-announce/2006-May/000014.html in 2006 migrated to Mediawiki], eventually [https://lists.osgeo.org/pipermail/grass-announce/2008-April/000043.html in 2008 to moved to OSGeo] at https://grasswiki.osgeo.org/). The mailing lists were [https://lists.osgeo.org/pipermail/grass-announce/2007-November/000036.html migrated in 2007] from ITC-irst, Trento/Italy, to OSGeo, USA.
of a message.


The review of the entries reported by the parser (DOVE CI PORTA ?)
...


yay... La cosa deve un po' crescere. (AL LIMITE CI LIMITIAMO A SPIEGARE CHE SIAMO CONTENTI DI AVER FATTO UNA PRIMA ESTRAZIONE...E CHE )
== And then.. ==
* RELEASES AND EMAIL HYPE (MN)
* ANNI 90: depression and renewal
* Full steam with OSGeo... GRASS list in the top 10!


==Poi==
== Ideas ==
- RELEASES AND EMAIL HYPE (faccio io)
Consider watching the ML using https://github.com/elationfoundation/openThreads
- ANNI 90: depression and renewal
(See: http://www.slideshare.net/apw217/sotm-openthreadsfinal for an introduction to the project)
- ...

Latest revision as of 22:28, 23 December 2016

Watching how grass-dev develops (and grass-user is used)

DRAFT - work in progress - by A Giacomelli and M Neteler

Introduction

During the 10th GRASS GFOSS User meeting in Cagliari, Italy, a summary of the activities of the Italian GFOSS community was presented. Together with basic indicators on the activity of the Italian community, some simple yet intriguing statistics, derived from an analysis of the main discussion mailing lists were shown. (DARE DUE esempi SU QUESTO).

In the typical brainstorming atmosphere which permits events such as software user meetings, we considered the idea of replicating the same analysis on two other mailing lists with a much longer history, namely the grass developer (grass-dev) and the grass user (grass-user) mailing lists.

The outcome of the analysis provides a unique insight on the dynamics of the user and developer communities, over an extremely long time span, from 1991 through 2008.

How source data was collected

The story of the creation of a seventeen-year long archive of communications deserves some description, as it is representative of the effort spent in maintaining a historical record of the communications within developers (and users) through various phases of the GRASS project.

SHORT STORY ABOUT LONG CONVERSATION (uh, could be better title)

Members of the US Corps of Engineers (US Army CERL) launched the grass-user and grass-dev "GRASShopper" mailing lists in December 1991 ("Opening Night" in grass-dev and grass-user). And the first user message came from Politecnico di Milano in Italy... Only the following year the comp.infosystems.gis Usenet group was born, followed with a GRASS mailing lists-Usenet interface in December 1992.

  • new mailing lists born in 1999 at University of Hannover
  • in 2001 lists migrated to Italy with MN and server migration
  • missing emails recovered from dejanews and merged into original lists mbox files (which MN received from US Army, don't remember precisely)
  • All email headers for many years had to be reconstructed since their format was broken.
  • complete archive restored and online (check date MN)
  • in 2007, lists migrated to OSGeo infrastructure

Analysis Methodology

The information extraction approach used leans on the KISS side: the core of the parsing is handled by a perl script, while the remaining post processing is carried out via standard queries and no-nonsense charting tools.

Time and space

The first core set of information extracted was the time zone reference of the messages, considering that time zone may be used to provide an approximate indication of longitude.

One of the drivers for our analysis was also to verify if/how the mailing lists provided an evidence of the shift of development activity from the initial US-based model to Europe, rather than providing a detailed spatial distribution of the developers or the users. This insured that simply considering the time zone reference would be an adequate proxy of location for the source of a given message.

For the grass-dev list, the results we obtained from a first pass with the scripts developed was able to parse correctly over 99% percent of the messages. It may be possible to obtain a greater completeness by refining the parsing algorithm to handle exceptions encountered in the process, but we considered the level of approximation obtained in the extraction of the time zone reference to be adequate for the quality objectives of our analysis.

For the grass-user mailing list, the number of messages with time zone not identified by the first pass of the parsing algorithm is higher (some 3%), but still considered satisfactory within the scope of the current analysis.


What do time and time zones tell

The charts (include numbers) show:

  • absolute number of message postings by time zone and year (Figures 1 and 4, respectively for GRASS-dev and GRASS-user)
  • the relative proportion of messages posted each year from a given time zone (Figures 2 and 5, respectively for GRASS-dev and GRASS-user)
  • the cumulated proportion of messages deriving from different time zones, calculated assuming 100% to be the e-mail traffic generated from the beginning of the mailing list records through 2008 (Figures 3 and 6, respectively for GRASS-dev and GRASS-user)
  • Figure 7: local time of posting on grass-dev: shows that most of the communication is done on business hours (and some in the evening)
(mettere qui vari spunti)


Number of messages by time zone
Relative proportion of messages per year
Cumulated percent of messages
grass-user: number of messages per time zone
grass-user: relative proportion of messages per time zone
grass-user: cumulated percentage of messages
grass-dev: local time of message posting by year

...and what about the contents ?

Another interesting analysis is represented by the text extraction of specific keywords from the message body. While it can be extremely intriguing to build dictionaries of words and expressions used within a mailing list, in the case of the GRASS lists, we decided to focus on GRASS commands. Matrices with the occurrence of GRASS commands by year were generated for both mailing lists. The clear limitation in this type of analysis is that the use of a term is not associated to context. Reference to a specific command may not indicate if this is associated to a coding problem, to issues in use, or to working examples. Another element which is neglected in the analysis is quotation: i.e. the occurrence of a term is counted as long as it appears in the body of a message. At the same time, we think that even a preliminary analysis does provide extremely interesting insight on the bulk of mailing list traffic.

The review of the entries reported by the parser (Figures xx and yy) (DOVE CI PORTA ?)


grass-dev: mentioning of commands by year
grass-user: occurrence in time of top20 most mentioned commands

yay... La cosa deve un po' crescere. (AL LIMITE CI LIMITIAMO A SPIEGARE CHE SIAMO CONTENTI DI AVER FATTO UNA PRIMA ESTRAZIONE...)

Reference for details beyond the top 20 entries: worksheets 5 and 6 of ODS file

Trivia

Already in early days GRASS developers were concerned about copyright issues and introduction of non-free code in GRASS. The GRASS FTP site was changed to a 24hs service in September 1992. We remember that there was no WWW at all in those days. First GRASS Linux binaries where announced in February 1994. A first European GRASS FTP mirror site appeared in January 1995. The first internet viruses were discussed in April 1995 and first spam reached the lists in October 1995 (most spam was later manually removed from the list archives by MN). With CERL winding down the development in 1996 due to a governmental decision, the traffic in the developers ist went down to a minimum. The GRASS project returned to real activity in 1997. A first attempt to get GRASS into Debian in March 1998. The new GRASS 4.2.1 package, the first release after 4.1 from CERL became available in January 1998. The license change to GPL was discussed in October 1999. GRASS 5 beta releases came out in October 1999. In 2006 a GRASS Wiki was started (originally TWIKI at http://grass.gdf-hannover.de/wiki/, then in 2006 migrated to Mediawiki, eventually in 2008 to moved to OSGeo at https://grasswiki.osgeo.org/). The mailing lists were migrated in 2007 from ITC-irst, Trento/Italy, to OSGeo, USA.

...

And then..

  • RELEASES AND EMAIL HYPE (MN)
  • ANNI 90: depression and renewal
  • Full steam with OSGeo... GRASS list in the top 10!

Ideas

Consider watching the ML using https://github.com/elationfoundation/openThreads (See: http://www.slideshare.net/apw217/sotm-openthreadsfinal for an introduction to the project)