Migration from CVS to SVN

From GRASS-Wiki
Revision as of 20:13, 4 November 2007 by ⚠️Landa (talk | contribs) (→‎Scenario 2: sed for 5_0)
Jump to navigation Jump to search

This page contains notes related to GRASS code migration (planned) from CVS to SVN.

Basic

  • The SVN command line interface is just like CVS, many tasks are identical- just change the program name from cvs to svn.

Gotchas

  • cvs2svn is known to break binary files (images) which were not imported into the CVS with the -kb flag. Luckily Glynn fixed most of these some months ago.
  • Files using keyword substitution, such as $Date$ in the description.html files, will have to have support for that enabled manually, once per file (or write a find routine with | xargs svn ...).
$ svn propset svn:keywords "Date" filename.txt
$ svn commit

or

$ find . -name '*.c' | xargs svn propset svn:keywords "Date" 
$ find . -name '*.html' | xargs svn propset svn:keywords "Date" 
$ svn commit
  • how to maintain timestamps of files? We want to keep the last modification date, not the date of local download

GRASS CVS repository structure

/grass-cvs
|
+---/CVSROOT
|
+---/grass
|
+---/grass51
|
+---/grass6
|
+---/grass_doc
|
+---/libgrass
|
+---/newsletter
|
+---/programgrass50
|
+---/web

Testing SVN repository

  • grass/grass should be renamed to grass/grass5

Question:

  • Migrate all directories or only actively used (grass, grass6, newsletter, web)?
  • Create separate repositories (grass5, grass6, grassweb, grassnewsletter, grass7) or one repository (grass-svn)?

Proposed structure

Scenario 1

/grass-svn
|
+---/grass5
    |
    +---/branches
    |
    +---/tags
    |
    +---/trunk
+---/grass6
    |
    +---/branches
    |
    +---/tags
    |
    +---/trunk
+---/grass7
    |
    +---/branches
    |
    +---/tags
    |
    +---/trunk
+---/newsletter
    |
    +---/branches (???)
    |
    +---/tags (???)
    |
    +---/trunk
+---/web
    |
    +---/trunk

Scenario 2

/grass-svn
|
+---/trunk
|
+---/branches
    |
    +---/grass5
    |
    +---/grass6
    |
    +---/grass7
+---/tags

Copy of GRASS CVS repository

rsync -r --times --links --bwlimit=200 --delete rsync://rsync.intevation.de/grass grass-cvs

→ cca 600 MB!

Creating GRASS SVN repository

grass5

cvs2svn --use-cvs --no-default-eol \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grass5svn-orig grass-cvs/grass

grass5svn-orig

Question: exclude selected branches/tags (which one?)

According to undocumented these branches/tags are suggested to be excluded:

--exclude="Mike" \
--exclude="grass" \
--exclude="unlabeled-.*" \
--exclude=devices_cleanup_20000420 \
--exclude=post_compare_glynn_head_2002_11_27 \
--exclude=post_compare_glynn_release_2002_11_27 \
--exclude=post_merge_head_2002_01_22 \
--exclude=post_sync_2002_01_22 \
--exclude=pre-curses-fix \
--exclude=pre_merge_head_2002_01_22 \
--exclude=pre_merge_release_2002_01_22 \
--exclude=pre_sync_2001_10_31 \
--exclude=pre_sync_2002_01_17 \
--exclude=release_15_05_2004_grass5_3_0 \
--exclude=releasebranch_11_april_2001_5_0_0 \
--exclude=releasebranch_11_april_2001_5_0_0_DEAD \
--exclude=releasebranch_500 \
--exclude=releasebranch_5_0_0 \
--exclude=release_grass5beta11pre1_21_january_2001 \
--exclude=release_grass5beta11pre2_28_january_2001 \
--exclude=start \
--exclude=testbranch_5_0_0stable \
--exclude=grassreleasebranch_5_0_0 \

A lot of dependency problems:

ERROR: The branch 'unlabeled-1.7.4' cannot be excluded because the following symbols depend on it:
   'release_grass500pre1_20_may_2001'
   'release_13_september_2001_grass5_0_0_pre2'
   'release_16_january_2002_grass5_0_0_pre3'
   'releasebranch_14_august_2001_5_0_0'
ERROR: The branch 'unlabeled-1.6.4' cannot be excluded because the following symbols depend on it:
   'release_grass500pre1_20_may_2001'
   'color_changes_20010502'
   'release_13_september_2001_grass5_0_0_pre2'
   'release_16_january_2002_grass5_0_0_pre3'
   'releasebranch_14_august_2001_5_0_0'
...
cvs2svn --use-cvs --no-default-eol \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
--exclude="Mike" \
--exclude="grass" \
--exclude="unlabeled-.*" \
--exclude=devices_cleanup_20000420 \
--exclude=post_compare_glynn_head_2002_11_27 \
--exclude=post_compare_glynn_release_2002_11_27 \
--exclude=post_merge_head_2002_01_22 \
--exclude=post_sync_2002_01_22 \
--exclude=pre-curses-fix \
--exclude=pre_merge_head_2002_01_22 \
--exclude=pre_merge_release_2002_01_22 \
--exclude=pre_sync_2001_10_31 \
--exclude=pre_sync_2002_01_17 \
--exclude=release_03_11_2003_grass5_0_3 \
--exclude=release_05_11_2004_grass5_4_0 \
--exclude=release_10_04_2003_grass5_0_2 \
--exclude=release_13_may_2002_grass5_0_0_pre4 \
--exclude=release_13_september_2001_grass5_0_0_pre2 \
--exclude=release_15_05_2004_grass5_3_0 \
--exclude=release_16_january_2002_grass5_0_0_pre3 \
--exclude=release_17_06_2004_grass5_7_0 \
--exclude=release_25_06_2002_grass5_0_0_pre5 \
--exclude=release_28_01_2003_grass5_0_1 \
--exclude=release_30_08_2002_grass5_0_0 \
--exclude=releasebranch_11_april_2001_5_0_0 \
--exclude=releasebranch_11_april_2001_5_0_0_DEAD \
--exclude=releasebranch_500 \
--exclude=releasebranch_5_0_0 \
--exclude=release_grass500pre1_20_may_2001 \
--exclude=release_grass5beta10_7_december_2000 \
--exclude=release_grass5beta11_4_february_2001 \
--exclude=release_grass5beta11pre1_21_january_2001 \
--exclude=release_grass5beta11pre2_28_january_2001 \
--exclude=release_grass5beta6_16_feb_2000 \
--exclude=release_grass5beta7_20_april_2000 \
--exclude=release_grass5beta8_26_july_2000 \
--exclude=release_grass5beta9_6_december_2000 \
--exclude=start \
--exclude=testbranch_5_0_0stable \
--exclude=releasebranch_5_4 \
--exclude=grassreleasebranch_5_0_0 \
--exclude=releasebranch_26_april_2002_5_0_0 \
-s grass5svn grass-cvs/grass

grass5svn

grass6

cvs2svn --use-cvs --no-default-eol \
-s grass6svn grass-cvs/grass6
...
Error summary:
ERROR: A CVS repository cannot contain both grass-cvs/grass6/display/d.erase/main.c,v and grass-cvs/grass6/display/d.erase/Attic/main.c,v
ERROR: A CVS repository cannot contain both grass-cvs/grass6/general/g.mapsets/main_inter.c,v and grass-cvs/grass6/general/g.mapsets/Attic/main_inter.c,v
ERROR: A CVS repository cannot contain both grass-cvs/grass6/include/gproj_api.h,v and grass-cvs/grass6/include/Attic/gproj_api.h,v
ERROR: A CVS repository cannot contain both grass-cvs/grass6/visualization/nviz/src/getCat.c,v and grass-cvs/grass6/visualization/nviz/src/Attic/getCat.c,v
Exited due to fatal error(s).

Question: Remove Attic files?

rm -f grass-cvs/grass6/display/d.erase/Attic/main.c,v
rm -f grass-cvs/grass6/general/g.mapsets/Attic/main_inter.c,v
rm -f grass-cvs/grass6/include/Attic/gproj_api.h,v
rm -f grass-cvs/grass6/visualization/nviz/src/Attic/getCat.c,v

Restart

cvs2svn --use-cvs --no-default-eol \
-s grass6svn grass-cvs/grass6
...
----- pass 2 (CollateSymbolsPass) -----
ERROR: It is not clear how the following symbols should be converted.
Use --force-tag, --force-branch and/or --exclude to resolve the ambiguity.
   'releasebranch_6_2' is a tag in 2 files, a branch in 5259 files and has commits in 1513 files

Restart

cvs2svn --use-cvs --no-default-eol \
--force-branch=releasebranch_6_2 \
-s grass6svn-orig grass-cvs/grass6

grass6svn-orig

Exclude all undocumented braches

cvs2svn --use-cvs --no-default-eol 
--force-branch=releasebranch_6_2 \
--exclude="grass" \
--exclude="grassreleasebranch_5_0_0" \
--exclude="markus" \
--exclude="releasebranch_14_august_2001_5_0_0" \
--exclude="releasebranch_26_april_2002_5_0_0" \
--exclude="releasebranch_5_4" \
--exclude="unlabeled.*" \
-s grass6svn1 grass-cvs/grass6
...
----- pass 2 (CollateSymbolsPass) -----
Checking for blocked exclusions...
ERROR: The branch 'markus' cannot be excluded because the following symbols depend on it:
    'start'
    'releasebranch_11_april_2001_5_0_0'
ERROR: The branch 'grass' cannot be excluded because the following symbols depend on it:
...
cvs2svn --use-cvs --no-default-eol \
--force-branch=releasebranch_6_2  \
--exclude="grass" \
--exclude="grassreleasebranch_5_0_0" \
--exclude="markus" \
--exclude="releasebranch_14_august_2001_5_0_0" \
--exclude="releasebranch_26_april_2002_5_0_0" \
--exclude="releasebranch_5_4" \
--exclude="unlabeled.*" \
--exclude="devices_cleanup_20000420" \
--exclude="post_compare_glynn_head_2002_11_27" \
--exclude="post_compare_glynn_release_2002_11_27" \
--exclude="post_merge_head_2002_01_22" \
--exclude="post_sync_2002_01_22" \
--exclude="pre-curses-fix" \
--exclude="pre_merge_head_2002_01_22" \
--exclude="pre_merge_release_2002_01_22" \
--exclude="pre_sync_2001_10_31" \
--exclude="pre_sync_2002_01_17" \
--exclude="release_03_11_2003_grass5_0_3" \
--exclude="release_05_11_2004_grass5_4_0" \
--exclude="release_10_04_2003_grass5_0_2" \
--exclude="release_13_may_2002_grass5_0_0_pre4" \
--exclude="release_13_september_2001_grass5_0_0_pre2" \
--exclude="release_15_05_2004_grass5_3_0" \
--exclude="release_16_january_2002_grass5_0_0_pre3" \
--exclude="release_17_06_2004_grass5_7_0" \
--exclude="release_25_06_2002_grass5_0_0_pre5" \
--exclude="release_28_01_2003_grass5_0_1" \
--exclude="release_30_08_2002_grass5_0_0" \
--exclude="releasebranch_11_april_2001_5_0_0" \
--exclude="releasebranch_11_april_2001_5_0_0_DEAD" \
--exclude="releasebranch_500" \
--exclude="releasebranch_5_0_0" \
--exclude="release_grass500pre1_20_may_2001" \
--exclude="release_grass5beta10_7_december_2000" \
--exclude="release_grass5beta11_4_february_2001" \
--exclude="release_grass5beta11pre1_21_january_2001" \
--exclude="release_grass5beta11pre2_28_january_2001" \
--exclude="release_grass5beta6_16_feb_2000" \
--exclude="release_grass5beta7_20_april_2000" \
--exclude="release_grass5beta8_26_july_2000" \
--exclude="release_grass5beta9_6_december_2000" \
--exclude="start" \
--exclude="testbranch_5_0_0stable" \
--exclude="unlabeled-1.1.1.1.4" \
--exclude="unlabeled-1.1.1.1.6" \
--exclude="color_changes_20010502" \
-s grass6svn grass-cvs/grass6

Question: Exclude more tags (which)?

  • I guess tags "freetypecap", "lastworking", "pre_fileinfo_change" and maybe also "pre_vdigit_changes_.*" could be also excluded... --ML

Add other switches (e.g. --encoding) ??

cvs2svn --use-cvs --no-default-eol \
--force-branch=releasebranch_6_2  \
--exclude="grass" \
--exclude="grassreleasebranch_5_0_0" \
--exclude="markus" \
--exclude="releasebranch_14_august_2001_5_0_0" \
--exclude="releasebranch_26_april_2002_5_0_0" \
--exclude="releasebranch_5_4" \
--exclude="unlabeled.*" \
--exclude="devices_cleanup_20000420" \
--exclude="post_compare_glynn_head_2002_11_27" \
--exclude="post_compare_glynn_release_2002_11_27" \
--exclude="post_merge_head_2002_01_22" \
--exclude="post_sync_2002_01_22" \
--exclude="pre-curses-fix" \
--exclude="pre_merge_head_2002_01_22" \
--exclude="pre_merge_release_2002_01_22" \
--exclude="pre_sync_2001_10_31" \
--exclude="pre_sync_2002_01_17" \
--exclude="release_03_11_2003_grass5_0_3" \
--exclude="release_05_11_2004_grass5_4_0" \
--exclude="release_10_04_2003_grass5_0_2" \
--exclude="release_13_may_2002_grass5_0_0_pre4" \
--exclude="release_13_september_2001_grass5_0_0_pre2" \
--exclude="release_15_05_2004_grass5_3_0" \
--exclude="release_16_january_2002_grass5_0_0_pre3" \
--exclude="release_17_06_2004_grass5_7_0" \
--exclude="release_25_06_2002_grass5_0_0_pre5" \
--exclude="release_28_01_2003_grass5_0_1" \
--exclude="release_30_08_2002_grass5_0_0" \
--exclude="releasebranch_11_april_2001_5_0_0" \
--exclude="releasebranch_11_april_2001_5_0_0_DEAD" \
--exclude="releasebranch_500" \
--exclude="releasebranch_5_0_0" \
--exclude="release_grass500pre1_20_may_2001" \
--exclude="release_grass5beta10_7_december_2000" \
--exclude="release_grass5beta11_4_february_2001" \
--exclude="release_grass5beta11pre1_21_january_2001" \
--exclude="release_grass5beta11pre2_28_january_2001" \
--exclude="release_grass5beta6_16_feb_2000" \
--exclude="release_grass5beta7_20_april_2000" \
--exclude="release_grass5beta8_26_july_2000" \
--exclude="release_grass5beta9_6_december_2000" \
--exclude="start" \
--exclude="testbranch_5_0_0stable" \
--exclude="unlabeled-1.1.1.1.4" \
--exclude="unlabeled-1.1.1.1.6" \
--exclude="color_changes_20010502" \
--exclude="freetypecap" \
--exclude="lastworking" \
--exclude="pre_fileinfo_change" \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grass6svn grass-cvs/grass6

All documented braches/tags were migrated. Repository contains also some undocumented tags: grass_6_0_0, grass_6_0_0beta1, post_vdigit_changes_2007023, pre_vdigit_changes_20070221, pre_vdigit_changes_2007023, release_20071021_grass_6_2_3RC1. In contrast releasebranch_6_0 is missing.

grass6svn

newsletter

SUGGESTION: merge newsletter into the OSGeo journal SVN

cvs2svn --use-cvs --no-default-eol \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grassnlsvn grass-cvs/newsletter

Question: exclude branch "markus" and tags "Final_version", "ready_for_grammar_spelling_correction" and "start"?

  • I guess maybe only trunk can be migrate, we don't need any branches and tags in this case... --ML
cvs2svn --use-cvs --no-default-eol \
--exclude="markus" \
--exclude="Final_version" \
--exclude="ready_for_grammar_spelling_correction" \
--exclude="start" \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grassnlsvn grass-cvs/newsletter
...
Checking for blocked exclusions...
ERROR: The branch 'markus' cannot be excluded because the following symbols depend on it:
    'volume1_final'
...

Cannot exclude branch "markus"

cvs2svn --use-cvs --no-default-eol \
--exclude="Final_version" \
--exclude="ready_for_grammar_spelling_correction" \
--exclude="start" \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grassnlsvn grass-cvs/newsletter

Questions:

  • Rename brach "markus"?
  • Rename tags "volume[1|2]_final" to "release_vol[1|2]"?

I guess only trunk is enough -- ML

cvs2svn --use-cvs --no-default-eol \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
--trunk-only \
-s grassnlsvn grass-cvs/newsletter

grassnlsvn

web

cvs2svn --use-cvs --no-default-eol \

--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grasswebsvn grass-cvs/web
...
ERROR: A CVS repository cannot contain both grass-cvs/web/bugtracking/index.html,v and grass-cvs/web/bugtracking/Attic/index.html,v
...

Question: Remove Attic file?

rm -f grass-cvs/web/bugtracking/Attic/index.html,v

Question: Ignore branches ("markus") and tags ("start")?

cvs2svn --use-cvs --no-default-eol \
--trunk-only \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grasswebsvn grass-cvs/web

grasswebsvn

grass7

Based on grass6 HEAD.

cvs2svn --use-cvs --no-default-eol \
--force-branch=releasebranch_6_2 \
--trunk-only \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
-s grass7svn grass-cvs/grass6

grass7svn

Scenario 1

svnadmin create grass-svn
cvs2svn --use-cvs --no-default-eol \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
--branches=grass5/branches \
--tags=grass5/tags \
--trunk=grass5/trunk \
--existing-svnrepos \
-s grass-svn grass-cvs/grass
cvs2svn --use-cvs --no-default-eol \
--force-branch=releasebranch_6_2  \
--exclude="grass" \
--exclude="grassreleasebranch_5_0_0" \
--exclude="markus" \
--exclude="releasebranch_14_august_2001_5_0_0" \
--exclude="releasebranch_26_april_2002_5_0_0" \
--exclude="releasebranch_5_4" \
--exclude="unlabeled.*" \
--exclude="devices_cleanup_20000420" \
--exclude="post_compare_glynn_head_2002_11_27" \
--exclude="post_compare_glynn_release_2002_11_27" \
--exclude="post_merge_head_2002_01_22" \
--exclude="post_sync_2002_01_22" \
--exclude="pre-curses-fix" \
--exclude="pre_merge_head_2002_01_22" \
--exclude="pre_merge_release_2002_01_22" \
--exclude="pre_sync_2001_10_31" \
--exclude="pre_sync_2002_01_17" \
--exclude="release_03_11_2003_grass5_0_3" \
--exclude="release_05_11_2004_grass5_4_0" \
--exclude="release_10_04_2003_grass5_0_2" \
--exclude="release_13_may_2002_grass5_0_0_pre4" \
--exclude="release_13_september_2001_grass5_0_0_pre2" \
--exclude="release_15_05_2004_grass5_3_0" \
--exclude="release_16_january_2002_grass5_0_0_pre3" \
--exclude="release_17_06_2004_grass5_7_0" \
--exclude="release_25_06_2002_grass5_0_0_pre5" \
--exclude="release_28_01_2003_grass5_0_1" \
--exclude="release_30_08_2002_grass5_0_0" \
--exclude="releasebranch_11_april_2001_5_0_0" \
--exclude="releasebranch_11_april_2001_5_0_0_DEAD" \
--exclude="releasebranch_500" \
--exclude="releasebranch_5_0_0" \
--exclude="release_grass500pre1_20_may_2001" \
--exclude="release_grass5beta10_7_december_2000" \
--exclude="release_grass5beta11_4_february_2001" \
--exclude="release_grass5beta11pre1_21_january_2001" \
--exclude="release_grass5beta11pre2_28_january_2001" \
--exclude="release_grass5beta6_16_feb_2000" \
--exclude="release_grass5beta7_20_april_2000" \
--exclude="release_grass5beta8_26_july_2000" \
--exclude="release_grass5beta9_6_december_2000" \
--exclude="start" \
--exclude="testbranch_5_0_0stable" \
--exclude="unlabeled-1.1.1.1.4" \
--exclude="unlabeled-1.1.1.1.6" \
--exclude="color_changes_20010502" \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
--branches=grass6/branches \
--tags=grass6/tags \
--trunk=grass6/trunk \
--existing-svnrepos \
-s grass-svn grass-cvs/grass6
cvs2svn --use-cvs --no-default-eol \
--exclude="Final_version" \
--exclude="ready_for_grammar_spelling_correction" \
--exclude="start" \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
--branches=newsletter/branches \
--tags=newsletter/tags \
--trunk=newsletter/trunk \
--existing-svnrepos \
-s grass-svn grass-cvs/newsletter
cvs2svn --use-cvs --no-default-eol \
--trunk-only \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
--trunk=web/trunk \
--existing-svnrepos \
-s grass-svn grass-cvs/web
cvs2svn --use-cvs --no-default-eol \
--force-branch=releasebranch_6_2 \
--trunk-only \
--encoding="ASCII" \
--encoding="UTF-8" \
--encoding="ISO-8859-1" \
--trunk=grass7/trunk \
--existing-svnrepos \
-s grass-svn grass-cvs/grass6

grass-svn

Scenario 2

Notes:

  • newsletter repository can be merged with OSGeo newsletter repository
  • web in separate repository outside of trac
svnadmin create grass-svn2 # cp -r grass6svn grass-svn2 

svnadmin dump grass6svn  > grass6svn.dump
svnadmin load grass-svn2 < grass6svn.dump

svnadmin dump grass5svn-orig > grass5svn-orig.dump

# note: grass and Mike must be included because of dependency
cat grass5svn-orig.dump | svndumpfilter include \
trunk \
branches/releasebranch_5_4 \
tags/release_05_11_2004_grass5_4_0 \
tags/release_26_07_2007_grass5_4_1 \
branches/releasebranch_26_april_2002_5_0_0 \
tags/release_03_11_2003_grass5_0_3 \
branches/grass \
branches/Mike \
--drop-empty-revs --renumber-revs > grass5svn-filter.dump

cat grass5svn-filter.dump | \
sed 's/Node-path: trunk/Node-path: branches\/releasebranch_5_5/g' | \
sed 's/Node-copyfrom-path: trunk/Node-copyfrom-path: branches\/releasebranch_5_5/g' > \
sed 's/Node-path: branches\/releasebranch_26_april_2002_5_0_0/Node-path: branches\/releasebranch_5_0/g' | \
sed 's/Node-copyfrom-path: branches\/releasebranch_26_april_2002_5_0_0/Node-copyfrom-path: branches\/releasebranch_5_0/g' > \
grass5svn-filter1.dump

grass-svn2

MIME types

According to Converting CVS to subversion:

CVSREPOS="$(pwd)/grass-cvs"
PROJECTNAME=grass6
# Find all extensions. Also include filenames without extension.
# The E (extension) and S (slash) trick is to get GNU sort to separate
# them, although this is not really necessary. But note that at the
# same time it removes the leading slash from filenames without extension.
find $CVSREPOS/$PROJECTNAME -type f -name '*,v' ! -name '.cvsignore,v' | \
   sed -e 's%.*\([./][^.]*\),v$%\1%' -e 's/\./E/' -e 's/\//S/' | \
   sort -u | sed -e 's/^S//' -e 's/^E/./' > step1

# Compose an extended regular expression that matches any "extension"
# as found by the previous step.
EXT1="($(grep '^\.' step1 | xargs echo | sed -e 's/^\.//' -e 's/\+/\\+/g' -e 's/ \./|/g'))"

# Find all mime-types and related extensions that really exist.
egrep -i '^alnum:[^[:space:]]*space:+([^[:space:]]+ )*'"$EXT1"'($| )' /etc/mime.types > step2

# Extract the list of extensions from the previous step,
# filtering out the extensions that we don't have.
for ext in $(sed -re 's/^[^[:space:]]*space:+//' step2); do echo $ext; done | \
   egrep -i '^'"$EXT1"'$' | sort -u > step3

# Compose an extended regular expression from the previous step.
EXT2="($(cat step3 | xargs echo | sed -e 's/ /|/g'))"
 
# Find all "extensions" that weren't really extensions
# (or for which we don't know a MIME type).
grep '^\.' step1 | egrep -iv '^\.'"$EXT2"'$' > step4

# And turn it into an extended regular expression.
EXT3="($(sed -e 's/\./\\\\./' step4 | xargs echo | sed -e 's/ /|/g'))"

# Create a list of files for which no MIME type is known.
find $CVSREPOS/$PROJECTNAME -type f -name '*,v' ! -name '.cvsignore,v' | \
   sed -e 's%.*/\([^/]*\),v$%\1%' | egrep -i "$EXT3"'$' | sort -u > step5

Binary files in 'step5':

./macosx/app/app.icns:                                       data
./db/drivers/dbf/dbf_catalog/datetime.dbf:                   DBase 3 data file (2 records)
./imagery/i.atcorr/test_suite/ETM4_400x400_atms_corr.raw:    data
./imagery/i.atcorr/test_suite/ETM4_400x400.raw:              data
./macosx/app/English.lproj/MainMenu.nib/keyedobjects.nib:    Apple binary property list
./lib/proj/nzgd2kgrid0005.gsb:                               data
./db/drivers/dbf/dbf_catalog/river.dbf:                      DBase 3 data file (5 records)
./raster/r.slope.aspect/r_sl_asp_northangle_diffs.tar.gz:    gzip compressed data, was "r.slope.aspect-diffs.tar",  from Unix, last modified: Tue Jul 21 20:15:15 1998
./lib/vector/diglib/test.ok:                                 data

Add to mine.types:

application/dbase                          dbf
application/x-gtar                         gtar tgz taz tar.gz
application/octet-stream                   bin icns raw nib gsb ok
# Create a map from extension to MIME type. If a MIME type that starts
# with 'text' exist, use that - otherwise use application/octet-stream
# when there is more than one MIME type, or use the single known MIME type.
for f in $(cat step3); do \
   MIMETYPES=$(egrep -i 'space:'$f'( |$)' step2 | sed -e 's/space:.*//'); \
   echo $f: $MIMETYPES; done | \
   sed -e 's%:.* \(text/[^ ]*\).*%: \1%' -e 's%: [^ ].* .*%: application/octet-stream%' > step6
? b: chemical/x-molconn-Z (only ./raster/r.le/r.le.setup/polytocell/bmf.b, I guess it should be bmf.c...)
grass6/raster/r.le/r.le.setup/polytocell/bmf.b:               ASCII C program text
grass/src/raster/r.le/r.le.setup/polytocell/bmf.b:            ASCII C program text
grass/src/imagery/i.points3/inter/find.b:                     ASCII C program text
grass/src.contrib/SCS/paint/Programs/newp.map/cmd/scan_gis.b: ASCII C program text
grass/src.contrib/SCS/paint/Programs/newp.map/cmd/map.b:      ASCII C program text
grass/src.contrib/CERL/SGI/ISM/grid/gdwrit.b:                 ASCII text
grass/src.garden/grass.hdf/hdf3/HDF.lib.3.2.3/doc/HDF.apdx.b: ASCII English text
-> b: text/plain (text-csrc ?)

? bak: application/x-trash
grass/src.contrib/GMSL/sg4d/lightdefs.bak:                            ASCII C program text
grass/src.contrib/GMSL/g3d/src3d/raster/r3.showdspf.openGL/Viz.h.bak: ASCII C program text
-> bak: text/plain

? bat: application/x-msdos-program
grass6/scripts/windows_launch.bat:                            ASCII text
grass6/lib/init/grass.bat:                                    MS-DOS batch file text
grass6/lib/init/init.bat:                                     MS-DOS batch file text
grass6/lib/init/grass-run.bat:                                MS-DOS batch file text
grass6/visualization/nviz/scripts/nviz.bat:                   ASCII text
grass/src.contrib/CERL/raster/r.rational.regression/main.bat: ASCII C program text
grass/cygwin/startxgrass.bat:                                 MS-DOS batch file text
grass/cygwin/startxwingrass.bat:                              MS-DOS batch file text
bat: text/plain

c: text/x-csrc
cc: text/x-c++src
cpp: text/x-c++src
css: text/css
csv: text/csv

? dat: chemical/x-mopac-input
grass6/dist.i686-pc-linux-gnu/etc/nad/ntv1_can.dat:    data
grass6/misc/m.cogo/cogo.dat:                           ASCII text
grass6/lib/proj/ntv1_can.dat:                          data
grass6/lib/gis/fmode.dat:                              ASCII C program text
grass6/raster/r.statistics/gauss.dat:                  ASCII text
grass/src/libes/proj/ntv1_can.dat:                     data
grass/src/sites/s.qcount/tutorial/cls.dat:             ASCII text
grass/src/sites/s.qcount/tutorial/reg.dat:             ASCII text
grass/src/sites/s.qcount/tutorial/csr.dat:             ASCII text
grass/src/misc/m.cogo/cogo.dat:                        ASCII text
grass/src/raster/r.in.gridatb/example/elev.dat:        ASCII text
grass/src/raster/r.statistics/cmd/gauss.dat:           ASCII text
grass/src/paint/Drivers/versatec/patterns/ce3200.dat:  ASCII English text
grass/src.contrib/PURDUE/s.medp/doc/cressie.dat:       ASCII English text
grass/src.contrib/CERL/raster/nodenumber/xsect.dat:    ASCII text
grass/src.contrib/CERL/raster/nodenumber/yak_trap.dat: ASCII text
-> ?

dbf: application/dbase

? dir: application/x-director
grass6/raster/r.fill.dir:                     setgid directory
grass6/tools/cvs.rename.dir:                  POSIX shell script text executable
grass/src/raster/r.fill.dir:                  setgid directory
grass/src/tcltkgrass/module/r.fill.dir:       ASCII text
-> dir: text/plain (?)

eps: application/postscript
fig: application/x-xfig
gif: image/gif
gsb: application/octet-stream
h: text/x-chdr
hh: text/x-c++hdr
htm: text/html
html: text/html
icns: application/octet-stream
ico: image/x-icon
jpg: image/jpeg
lyx: application/x-lyx
man: application/x-troff-man
nib: application/octet-stream
ok: application/octet-stream

? old: application/x-trash
grass6/dist.i686-pc-linux-gnu/docs/html/gem/img1.old: PNG image data, 559 x 111, 8-bit colormap, interlaced
grass6/gem/docs/GEM-Manual/img1.old:                  PNG image data, 559 x 111, 8-bit colormap, interlaced
grass/unused/misc/m.clump/proto.h.old:                ASCII C program text
grass/src/raster/r.random.surface/MAN.old:            troff or preprocessor input text
grass/src/tcltkgrass/README.old:                      ASCII English text
-> ?

patch: text/x-diff
pdf: application/pdf
pl: text/x-perl
pm: text/x-perl
png: image/png
ps: application/postscript
py: text/x-python
raw: application/octet-stream
rgb: image/x-rgb
rtf: application/rtf
sh: text/x-sh

? src: application/x-wais-source
grass6/lib/init/grass-run.src:        POSIX shell script text executable
grass6/lib/init/grass.src:            POSIX shell script text executable
grass6/binaryInstall.src:             POSIX shell script text executable
grass/src/general/init/grass.src:     POSIX shell script text executable
grass/src/tcltkgrass/gis_set.tcl.src: ASCII English text
grass/binaryInstall.src:              POSIX shell script text executable
-> src: text/plain

? t: application/x-troff
grass6/swig/perl/t/R_slope_aspect.t: ASCII English text
-> t: text/plain

tcl: text/x-tcl
tex: text/x-tex
tgz: application/x-gtar
tiff: image/tiff
txt: text/plain
xbm: image/x-xbitmap

? xyz: chemical/x-xyz
grass6/scripts/r.out.xyz:                        setgid directory
grass6/scripts/r.out.xyz/r.out.xyz:              POSIX shell script text executable
grass6/raster/r.in.xyz:                          setgid directory
grass/src/raster/r.out.xyz:                      setgid directory
grass/src/tcltkgrass/module/r.out.xyz:           ASCII text
-> xyz: text/plain

External links

SVN hosting

There are two main options to host the new SVN repository.