GNU text tools

From GRASS-Wiki
Jump to navigation Jump to search

Using GNU text tools for GIS data preparation

The GNU text tools cat, cut, join, head, more, paste, sed and tail and the awk ("pattern scanning and processing language") provide a range of possibilities to modify ASCII texts and tables. Often attribute tables are delivered in ASCII formats such as CSV (Comma Separated Values format) or blank delimited text. Especially in scripts the tools introduced here are quite helpful to automate text formatting. Here we show modifications of the Spearfish soils map legend which is available at GRASS Web site and already included in the Spearfish sample data set (see /usr/local/share/grassdata/spearfish/soils_legend.txt). This legend is an ASCII table, which contains further attributes for the soils map. We want to show how to modify this table to a reclass rules file applicable to v.reclass and r.reclass. First let's have a look into the file:

    more soils_legend.txt

Within the more program continue to a next page with <SPACE>, quit with , search for a phrase with </>. Above file may look like this:

    0:no data:
    1:AaB:Alice fine sandy loam, 0 to 6
    2:Ba:Barnum silt loam
    3:Bb:Barnum silt loam, channeled
    4:BcB:Boneek silt loam, 2 to 6
    5:BcC:Boneek silt loam, 6 to 9
    6:BeE:Butche stony loam, 6 to 50
    [...]

The legend columns are separated by ":". In the first column the category numbers (attribute IDs) are stored. In the second column the first letter always capital is the initial letter of the soil name. The second letter is a capital if the mapping unit is broadly defined; otherwise, it is a small letter. The third letter, always a capital, A, B, C, D, E or F, indicates the slope. Symbols without slope letter are those of mapping units that do not have slope as part of the name. In the third column the full name of the soil is written along with the typical slope.

First we want to reduce the legend to attribute ID, soil name initials and text attribute without the slope information (this may go into another table or derived from the Spearfish elevation.dem). Note that we proceed step-bystep although you can also compose the commands to a few (or even a single) lines. The cut tool cuts column-wise depending on the specified delimiter. We specify delimiter : and select the first column (with field parameter f), then pipe the result into a new file:

    cut -d',' -f1 soils_legend.txt > soils_legend2.txt

Checking the new file with more shows us:

    0:no data:
    1:AaB:Alice fine sandy loam
    2:Ba:Barnum silt loam
    3:Bb:Barnum silt loam
    4:BcB:Boneek silt loam
    5:BcC:Boneek silt loam
    6:BeE:Butche stony loam
    [...]

Starting from the new file we will select only the text label:

    cut -d':' -f3 soils_legend2.txt > soils_legendlabels.txt

Checking the new file with more shows us:

    Alice fine sandy loam
    Barnum silt loam
    Barnum silt loam
    Boneek silt loam
    Boneek silt loam
    Butche stony loam
    [...]

Note that the first line is empty since the no-data field doesn't contain a text label. Alternatively you can compose above steps to one command:

    cut -d',' -f1 soils_legend.txt | cut -d':' -f3 > soils_legendlabels.txt

Now further hints: In case you want to cut a column at a specific position, you can use the -b flag:

    cut -b1,2 soils_legend.txt
    0:n
    1:A
    2:B
    3:B
    4:B
    5:B
    6:B
    [...]

If you want to see only the first lines of a file, use head. The number of lines to be displayed has to be entered with a preceding minus character:

    head -4 soils_legend.txt

The result looks as follows:

    0:no data:
    1:AaB:Alice fine sandy loam, 0 to 6
    2:Ba:Barnum silt loam
    3:Bb:Barnum silt loam, channeled

If you want to see only the last lines, use tail. It is used similar to head:

    tail -3 soils_legend.txt

leading to the output:

    53:WaA:Weber loam, 0 to 2
    54:Wb:Winetti cobbly loam
    55:water

Both may be combined to show a portion of the text file (here only lines 3 to 5):

    head -5 soils_legend.txt | tail -3

Here the head command shows the first five lines, the tail command the last three of these five lines:

    2:Ba:Barnum silt loam
    3:Bb:Barnum silt loam, channeled
    4:BcB:Boneek silt loam, 2 to 6

In order to sequentially concatenate two files, use cat:

    cat file1 file2 > file1and2

If you need to paste two files column-wise, use paste. You can optionally change the column delimiter from the default tabulator to another character. The command join is allowing to work similar to a simple database management system - it joins together ASCII tables according to unique column entries.

A powerful string editor is sed which allows to exchange, add or cut off strings from text files by rule definitions. With awk which we already used throughout the book, you can perform calculations or formatted printing. For details please refer to the related manual pages.

See also