Temporal data processing

From GRASS-Wiki
Revision as of 19:39, 23 September 2014 by Veroandreo (talk | contribs)
Jump to navigation Jump to search

Introduction

TGRASS is the temporal enabled GRASS GIS. It is available from GRASS GIS 7 onwards. TGRASS is completely metadata based, and managing temporal and spatial extent including temporal topology.

Terminology Overview

  • Space time raster datasets (strds) are designed to manage raster map time series. Modules that process strds have the naming prefix t.rast.
  • Space time 3D raster datasets (str3ds) are designed to manage 3D raster map time series. Modules that process str3ds have the naming prefix t.rast3d.
  • Space time vector datasets (stvds) are designed to manage vector map time series. Modules that process stvds have the naming prefix t.vect.

Example workflow for a Chlorophyll-a MODIS time series

The following examples are based on a series of MODIS L3 Chlorophyll-a product that is freely available at the ocean color site. So, say we download the SMI 8-day composite product at 4.6 km resolution for the period 2003-2013. That is a 506 set of images, 46 per year. Data comes as compressed HDF4 files. Chlorophyll products filenames look like this:

A20030012003008.L3m_8D_CHL_chlor_a_4km

 A: MODIS/Aqua
 2003: Year at start
 001: Julian day at start
 2003: Year at end
 008: Julian day at end
 L3m: Level 3 data, mapped (Projection: Plate carrée)
 8D: 8 day composition
 CHL: Chlorophyll a concentration product
 chlor_a: algorithm used 10^(a0 + a1*X + a2*X^2 + a3*X^3 + a4*X^4) 
 4km: 4.6km pixel size (8640x4320 image, 2.5 minute resolution)

We now decompress files and check metadata

# Go to where the data is and decompress
  find -iname '*.bz2' -exec bzip2 -d {} \; 
# Check file meta-data (GDAL utilities)
  gdalinfo A20030012003008.L3m_8D_CHL_chlor_a_4km

Next step is to import all 506 images into GRASS. You can use r.in.gdal or r.external for that. Note that global Cl-a images as downloaded from ocean color site are ~150 mb each (disk space issues!). Here, 3 different options:

  • import global images (506 images, 150 Mb each) with r.in.gdal, resize to study area and remove global files
  • set projection and extension, and resize to study area with gdal_translate, and import already resized images with r.in.gdal
  • import global images (506 images, 150 Mb each) with r.external, resize to study area and remove global files, as showed next:
# define region extension
g.region -p n=-38 s=-55 w=-70 e=-56

suffix=_tmp
for map in *chlor*; do
  r.external input=$map output=${map}${suffix} -o ;
  r.mapcalc expression="$map=${map}${suffix}" ;
  g.remove rast=${map}${suffix} ;
done

Time series processing

Once we have maps inside GRASS we can start the temporal processing. If this is the first time you'll use temporal modules, you need to run

  t.connect -d

to set the default temporal GIS database connection for the current mapset. The default TGIS database of type sqlite3 is located in the PERMANENT mapset directory. Temporal GIS content from all created mapsets will be stored there.

1. Creating a STRDS and registering maps

First step is to create a space time dataset by means of t.create. Let us create a strds for the Chlorophyll-a (Cl-a) time series. We need to define the type (raster, 3D raster or vector), if the time is absolute or relative and, the name of the space time dataset.

t.create type=strds temporaltype=absolute output=cla title="Chlorophyll-a concentration" \
  description="MODIS L3 Chlorophyll-a concentration for Argentinian sea"

Then, we register our 506 raster maps in the strds using t.register. This module assigns time stamps to raster, 3D raster and vector maps and register them into space time datasets. Existing timestamps can be read and used by t.register.

This module supports absolute and relative time. Maps can be registered by command line argument (a list of comma separated map names) or using an input file. The start time, end time and a temporal increment can be provided by command line or in the input file. End time and increment are mutual exclusive. Maps can be registered in several space time datasets using the same timestamp.

Start time and end time with absolute time must be provided using the format yyyy-mm-dd HH:MM:SS +HHMM. It is also supported to specify the date yyyy-mm-dd only. In case of relative time the temporal unit (years, months, days, hours, minutes or seconds) must be provided. The relative start time, end time and the increment are integers.

t.register -i type=rast input=cla_test maps=`g.list rast pat=*_chlor_* sep=,`\
  start="2003-01-01" increment="8 days"

This would have been the simplest solution, but 8-day products have a problem, the last image of each year is not a product of an 8-day composition, but 4 or 5-day. Then, when using the increment parameter, dates are not set properly. The solution was to create a list of maps, with their respective start and end date. As the filenames contain information regarding year and DOY (day of year), we can use the following Python script to read filenames and transform DOY to calendar dates (Thanks Soeren!).

# in python
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk("/path/to/the/maps"):
    f.extend(filenames)
    break
# to order the list    
f.sort() 
print f

import datetime
for map_name in f:
  start_year = int(map_name[1:5])
  start_day  = int(map_name[5:8])
  end_year   = int(map_name[8:12])
  end_day    = int(map_name[12:15])
  start = datetime.datetime(start_year, 1, 1) + datetime.timedelta(start_day - 1)
  end = datetime.datetime(int(end_year), 1, 1) + datetime.timedelta(end_day)
  print map_name + '|' + str(start) + '|' + str(end)

Using the number of characters in the filenames and datetime library in Python, you can convert DOY in the filenames into start_time and end_time as in the list you need to pass to t.register. The resulting list looks like this:

 A20030012003008.L3m_8D_CHL_chlor_a_4km_arg|2003-01-01 00:00:00|2003-01-09 00:00:00
 A20030092003016.L3m_8D_CHL_chlor_a_4km_arg|2003-01-09 00:00:00|2003-01-17 00:00:00
 A20030172003024.L3m_8D_CHL_chlor_a_4km_arg|2003-01-17 00:00:00|2003-01-25 00:00:00
 ...
 A20133452013352.L3m_8D_CHL_chlor_a_4km_arg|2013-12-11 00:00:00|2013-12-19 00:00:00
 A20133532013360.L3m_8D_CHL_chlor_a_4km_arg|2013-12-19 00:00:00|2013-12-27 00:00:00
 A20133612013365.L3m_8D_CHL_chlor_a_4km_arg|2013-12-27 00:00:00|2014-01-01 00:00:00

and then, the command would be:

t.register --o type=rast input=cla file=map_list

We can also set a color palette for all maps in the strds with:

t.rast.colors input=cla color=name_of_color_table

2. Getting some basic info and statistics

We now check the space time data sets we have in our mapset with:

t.list type=strds

and list information about our recently created strds. See t.info for additional uses.

t.info type=strds input=cla
 +-------------------- Space Time Raster Dataset -----------------------------+
 |                                                                            |
 +-------------------- Basic information -------------------------------------+
 | Id: ........................ cla@clorofila
 | Name: ...................... cla
 | Mapset: .................... clorofila
 | Creator: ................... veroandreo
 | Temporal type: ............. absolute
 | Creation time: ............. 2014-04-29 14:23:00.579342
 | Modification time:.......... 2014-05-12 09:15:08.917309
 | Semantic type:.............. mean
 +-------------------- Absolute time -----------------------------------------+
 | Start time:................. 2003-01-01 00:00:00
 | End time:................... 2014-01-01 00:00:00
 | Granularity:................ 1 day
 | Temporal type of maps:...... interval
 +-------------------- Spatial extent ----------------------------------------+
 | North:...................... -38.0
 | South:...................... -55.0
 | East:.. .................... -55.0
 | West:....................... -70.0
 | Top:........................ 0.0
 | Bottom:..................... 0.0
 +-------------------- Metadata information ----------------------------------+
 | Raster register table:...... raster_map_register_91ba57d5f0924f4fa0bd7176a1b39b2f
 | North-South resolution min:. 0.041667
 | North-South resolution max:. 0.041667
 | East-west resolution min:... 0.041667
 | East-west resolution max:... 0.041667
 | Minimum value min:.......... 0.02925
 | Minimum value max:.......... 0.26472
 | Maximum value min:.......... 5.2104
 | Maximum value max:.......... 99.953934
 | Aggregation type:........... None
 | Number of registered maps:.. 506
 |
 | Title:
 | Chlorophyll-a
 | Description:
 | Concentracion de Clorofila a
 | Command history:
 | # 2014-04-29 14:23:00 
 | t.create type="strds" temporaltype="absolute"
 |     output="cla" title="Chlorophyll-a"
 |     description="Concentracion de Clorofila a" --o
 | # 2014-04-29 14:23:23 
 | t.register --o type="rast" input="cla"
 |     file="map_list"
 | 
 +----------------------------------------------------------------------------+

Now, we get univariate statistics from the non-null cells for each registered raster map of the strds. For that matter we use t.rast.univar (link) which, by default, returns the name of the map, the start and end date of dataset and the following values: mean, minimum and maximum vale, mean_of_abs, standard deviation, variance, coeff_var, number of null cells, total number of cell.

t.rast.univar input=cla

In Linux-based systems you can send the output to a text file using

t.rast.univar input=cla > stats_cla

3. Listing maps and selections

The module t.rast.list allows you to list registered maps of a strds and provides several options to achieve what you want. For example, you can select different granules and also perform different queries by means of the where parameter. Some examples are:

t.rast.list input=cla method=gran granule="1 month"
# this will give one image every one month, 3 months, 1 year, or whatever granule you choose

t.rast.list input=cla order=min columns=id,name,start_time,min where="min <= '0.05'" 
# this will order by minimum value all the maps in the strds that have a minimum value lower than or equal to 0.05

t.rast.list input=cla order=max columns=name,start_time,max where="max > '10.0'"
# maps ordered by maximum value in which maximum value is higher than 10.

t.rast.list input=cla where="start_time >= '2003-01' and start_time <= '2003-06'" 
# all the maps in the first 6 month of the time series

In the where parameter you can use sql datetime functions. Then, to get for example, all maps which start date is in January, we can do:

t.rast.list input=cla where="strftime('%m', start_time)='01'"

# or 

t.rast.list input=cla \
  where='start_time >= datetime(start_time, "start of year") and start_time <= datetime(start_time, "start of year", "1 month")'

If you have monthly (instead of 8-day products) data and you want to list all January maps, then you can do:

t.rast.list input=cla_orig where="start_time=datetime(start_time, 'start of year', ' 0 month')"

4. Visualization

There are different visualization options for strds.

  • g.gui.timeline allows to compare temporal datasets by displaying their temporal extents in a plot.
# only temporal extent
g.gui.timeline inputs=cla
# temporal and spatio-temporal extent
g.gui.timeline -3 inputs=cla
Temporal and spatio-temporal extent of cla strds.
  • g.gui.tplot allows to see the values of one or more temporal datasets for a queried point defined by a coordinate pair.

Steps to use this module are:

  1. Select strds
  2. Select pair of coordinates (east,north) or point in the map
  3. Hit Run
  4. Customize as desired
  5. Save
Time series plot (Chlorophyll vs Time) for a certain coordinate pair in the study area
  • g.gui.animation is the tool for animating a series of raster and vector maps or a space time raster or vector dataset.
g.gui.animation strds=cla

5. Aggregation

For aggregations of data with different methods and different granularities, there are two very useful commands:

  • t.rast.series that performs different aggregation algorithms from r.series on all or a subset of raster maps in a strds, and
  • t.rast.aggregate that temporally aggregates the maps in a strds by a user defined granularity.

With these modules it is very simple to get maps of basic statistical parameters for different temporal granules, and this permits the analysis of the spatio-temporal variability of the variable of interest.

Some examples:

# yearly aggregation
t.rast.aggregate input=cla output=cla_yearly_average \
  base=cla_yearly_average granularity="1 years" \
  method=average sampling=start

# yearly aggregation with corresponding methods (output: 7 strds with 11 maps each)
for method in average median mode minimum maximum stddev range ; 
do
t.rast.aggregate input=cla output=cla_yearly_${method} \
                 base=cla_yearly_${method} granularity="1 years" \
                 method=${method} sampling=start
done

With the where parameter, we can select all 8-day products which start_time is 01 (January) over the years. Like this we can get the so-known climatologies.

t.rast.series input=cla method=average where="strftime('%m', start_time)='01'" output=january_average

Generalizing a bit, we can:

# climatologies for every month 
for i in 01 02 03 04 05 06 07 08 09 10 11 12 ; do 
  for m in average median mode stddev range minimum maximum ; do 
    t.rast.series input=cla method=${m} where="strftime('%m', start_time)='${i}'" output=${m}_${i}
  done
done

Using climatologies previously obtained, we'll now estimate monthly anomalies in the mean, max and min Cl-a concentration. First, we need to monthly aggregate data, and then do the difference between the monthly climatology and each respective monthly aggregate. We'll do the aggregation for the average, minimum and maximum of Cl-a concentration (from 506 input maps in cla strds, we'll get 132 maps in each monthly aggregated new strds).

# monthly aggregates (132 maps)

for method in average minimum maximum ; do
t.rast.aggregate input=cla output=cla_monthly_${method} base=cla_monthly_${method} \
  granularity="1 months" method=${method} sampling=contains
done

# January anomalies in mean, min and max Cl-a

t.rast.list -s input=cla_monthly_average where="start_time=datetime(start_time, 'start of year', '0 month')" columns=name

for m in average minimum maximum ; do 
  for i in 1 13 25 37 49 61 73 85 97 109 121 ; do # these numbers correspond to all january monthly aggregates
    r.mapcalc expression="Jan_${m}_anomaly_${i}=01_${m}-cla_monthly_${m}_${i}" 
  done
done

You can also use t.rast.list for looping, the same way you use g.mlist:

for map in `t.rast.list -s input=cla_monthly_average where="start_time=datetime(start_time, 'start of year', '0 month')" columns=name`; 
do
r.mapcalc expression="anomaly_${map}=01_average-${map}" 
done

Say we now need to know the date of the maximum value of Cl-a concentration over all the study period and/or on a yearly basis:

# map index for the overall maximum Cl-a value 
t.rast.series input=cla method=max_raster output=cla_max_index

# map index for the yearly maximum Cl-a value 
t.rast.aggregate input=cla granularity="1 year" method=max_raster output=yearly_max_index basename=yearly_max_index

The outputs show (pixelwise) the map index in which the maximum value of Cl-a occurs (from 1 to 506 and from 1 to 46, for the whole time series and on a yearly basis, respectively). For relative time this is maybe enough, but you may then want to reclassify data to get DOY, for example. In that case, you may use r.reclass.

If you already have monthly data, you can get climatologies quite simply as follows:

# January averages
t.rast.series input=cla_monthly method=average where=start_time=datetime(start_time, 'start of year', '0 month') output=jan_average

6. Spatio-temporal algebra with STRDS

The module t.rast.mapcalc allows us to perform spatio-temporal mapcalc expressions on temporally sampled maps of strds. There are spatial and temporal operators available for the "expression" string. Spatial operators, functions and internal variables are those used in r.mapcalc. Temporal internal variables supported for both relative and absolute time include: td(), start_time() and end_time(). There are also several very useful internal variables supported especially for absolute time of the current sample interval or instance, e.g.: start_doy(), start_year(), start_month() and so on (see t.rast.mapcalc manual site for further details and examples).

Some examples now. Say we did some analysis and decided that we will only consider values higher than 0.05. Then, we need to set all values below that threshold to null.

t.rast.mapcalc input=cla expression="if(cla < 0.05, null(), cla)" output=cla_corrected  basename=cla_corrected

or we may also want to take negative erroneous values to the knowm minimum of the strds, so:

t.rast.series input=cla method=minimum output=min_cla
t.rast.mapcalc input=cla expression="if(cla < 0.0, min_cla, cla)" output=cla_corrected  basename=cla_corrected

We may also need to reclassify all maps in the strds according to a certain threshold, e.g.: a certain level of Cl-a that indicates bloom conditions, in order to get bloom frequency afterwards:

# reclassify
t.rast.mapcalc -n input=cla output=cla_bloom basename=cla_bloom expression="if(cla > 0.75, 1, null())"
# bloom frequency
t.rast.series input=cla_bloom output=bloom_freq method=count

Do you remember we wanted to get the DOY of maximum Cl-a value before? Well, here's another way of doing it...

# overall maximum value
t.rast.series input=cla method=maximum output=max_cla
# new strds with DOY of overall maximum 
t.rast.mapcalc -n inputs=cla output=date_max_cla expression="if(cla == max_cla,start_doy(),null())" basename=date_max_cla 
# map with DOY of overall maximum
t.rast.series input=date_max_cla method=maximum output=max_cla_date
# remove date_max_cla strds (we were only interested in the resulting aggregated map)
t.remove -rf inputs=date_max_cla

In the development version of GRASS GIS, there's also a t.rast.mapcalc2 module that allows for temporal and spatial operations on strds by means of temporal raster algebra. The module expects an expression as input parameter in the following form:

"result = expression"

The statement structure is similar to r.mapcalc, the result is the name of a new strds that will contain the result of the calculation given as expression. Expressions can be any valid or nested combination of temporal operations and spatial overlay or buffer functions that are provided by the temporal algebra. See the manual for further details and explanations.

We'll use this module to estimate the rate of change (slope) between every pair of maps in the "cla" strds. The result will be a new strds consisting of maps with the slope value between every consecutive pair of maps in the original strds

t.rast.mapcalc2 expression="slope_cla = (cla[1]-cla[0])/8.0" basename=slope_cla
# we set 8 as fixed denominator, because products are 8-day compositions

we can then use any of the aggregation modules that we saw before to get the maximum or minimumm rate of change for different granularities.

7. Subseting and something else

t.rast.extract is another really useful module in temporal GRASS. It allows to extract a subset of a strds and store it in a different strds. You use the "where" condition to do the subset, but you can also specify a r.mapcalc sub-expression that performs operations on all maps of the selected subset. If no r.mapcalc expression is defined, the selected maps are simply registered in the new output strds.

Say we need to need to know in how many maps of the strds min values are below a threshold, and not only that, but you also want to know how many pixels per map meet that condition. Then, we can do

t.rast.extract input=cla where="min <= '0.05'" output=cla_less_05 basename=cla_less_05 expression="if(cla<0.05,1,null())"

to extract those maps with a minimum value lower than 0.05, and in the same step put 1 in every cell meeting the criterium and null everywhere else. To get a count of maps and pixels meeting the condition we may use:

# univariate stistics to get the count per map  
t.rast.univar input=cla_less_05 
# count map of cells with min value < 0.05
t.rast.series input=cla_less_05 output=count_cla_less_05 method=count

8. Importing / Exporting strds

Say we now need to do some processing in R (e.g.: run DINEOF to fill gaps in data), so we need to export our strds, hence we use t.rast.export.

t.rast.export input=cla output=cla4R compression=gzip

After we have done our analysis we can import the whole strds back to GRASS again, exporting it from R with read/write.tgrass from spacetime R package and using t.rast.import in GRASS.

t.rast.import input=cla_from_R.tar.gz output=cla_from_R base=new_map extrdir=/tmp

9. Neighborhood analysis

t.rast.neighbors performs a neighborhood analysis for each map in a space time raster dataset. This module supports a subset of the options already available in r.neighbors. Both, size of the neighborhood and aggregation method can be chosen. As an example, we'll estimate mean Cl-a concentration in a 3x3 neighborhood for every map in the strds.

t.rast.neighbors input=cla output=cla_smoth base=cla_smooth method=average size=3

10. Filling and reconstructing gappy data - HANTS

r.hants is an add-on not strictly within the temporal modules, but really useful when working with this kind of data, particularly with gappy data. Here's a simple example for generating a list of temporally oredered maps to use as input in r.hants, running HANTS and getting a map of dominant frequencies, afterwards. See the manual for further information on paramenter setting and explanations.

# use t.rast.list to create a list of temporally ordered maps  
t.rast.list input=cla order=start_time columns=name -s > map_list

# a hants run
r.hants file=map_list nf=5 fet=0.1 dod=11 base_period=46 suffix=_hants amplitude=amp_hants phase=pha_hants

# dominant frequency map (0 means the dominant freq is 1, 1 that dominant freq is 2, and so on...)
r.series input=`g.mlist rast pat=amp_hants* sep=,` output=dominant_freq_hants method=max_raster

11. TODO

(add some more)

FAQ

Aggregation with defined granularity

Q: I need to aggregate a strds with a granularity of 1 year, but shifting the start day one month in each run, i.e.: changing the start_time to 2003-02-01, 2003-03-01, 2003-04-01 and so on... My question is: if i recursively change start_time with the 'where' parameter, will the module t.rast.aggregate "aggregate" to the next february, march, april (what i'd wish) or just till the end of 2003?

A: If you specify a granularity of a year, then the start time to perform the aggregation will always be shifted to the 1st January of the current year and the end time the 1st January of the next year (eg. 2002-01-01 - 2003-01-01).  If you wish to aggregate a full year but shifting one month forward then simply use a granularity of 12 months.

Listing maps with specific start month

Q: I have a strds with 506 maps that correspond to 8-day composite products. I need to sequentially list all maps which "start_month" is January, February and so on... to use them as input in r.series (or t.rast.series). How can I achieve that?

A: You can use the datetime functionality of SQLite to perform this task, this should work for January:

t.rast.list input=cla_null_mayor65 \
  where='start_time >= datetime(start_time, "start of year") and
  start_time <= datetime(start_time, "start of year", "1 month")'

Expert tricks

BE CAREFUL. THIS IS NOT RECOMMENDED TO NEW USERS.

Creating a TGRASS DB with data from a different mapset

TGRASS is designed to only work with data present in the current mapset. An expert user may override this in order to register data from another mapset in his/her TGRASS database.

To achieve this, the following two internal variables must be set:

g.gisenv set="TGIS_DISABLE_MAPSET_CHECK=True"
g.gisenv set="TGIS_DISABLE_TIMESTAMP_WRITE=True"

in order to disable the mapset check and the writing of the timestamps of each map to the map metadata in the spatial database as text files. These variables can be set mapset specific.

Settings these variables "True" should (hopefully, because yet partially untested) allow the registration of maps outside the current mapset, even if you do not have the permission to modify the maps.

A warning will be printed if these variables are set True.

BUT, be aware that this feature can lead to the corruption of the temporal database and unwanted side effects. You can mess up the temporal database if you are not 100% sure what you are doing. It is no longer possible to access the timestamp information of these maps using the C-libraries, because the timestamp information is not available in the map metadata text files.

More expert tricks

Perhaps to be added...

References

See also