Temporal data processing

From GRASS-Wiki
Revision as of 13:10, 6 September 2014 by Veroandreo (talk | contribs) (more examples to aggregation section)
Jump to navigation Jump to search

Introduction

TGRASS is the temporal enabled GRASS GIS. It is available from GRASS GIS 7 onwards. TGRASS is completely metadata based, and managing temporal and spatial extent including temporal topology.

Terminology Overview

  • Space time raster datasets (strds) are designed to manage raster map time series. Modules that process strds have the naming prefix t.rast.
  • Space time 3D raster datasets (str3ds) are designed to manage 3D raster map time series. Modules that process str3ds have the naming prefix t.rast3d.
  • Space time vector datasets (stvds) are designed to manage vector map time series. Modules that process stvds have the naming prefix t.vect.

Example workflow for a Chlorophyll-a MODIS time series

The following examples are based on a series of MODIS L3 Chlorophyll-a product that is freely available at the ocean color site. So, say we download the SMI 8-day composite product at 4.6 km resolution for the period 2003-2013. That is a 506 set of images, 46 per year. Data comes as compressed HDF4 files. Chlorophyll products filenames look like this:

A20030012003008.L3m_8D_CHL_chlor_a_4km

 A: MODIS/Aqua
 2003: Year at start
 001: Julian day at start
 2003: Year at end
 008: Julian day at end
 L3m: Level 3 data, mapped (Projection: Plate carrée)
 8D: 8 day composition
 CHL: Chlorophyll a concentration product
 chlor_a: algorithm used 10^(a0 + a1*X + a2*X^2 + a3*X^3 + a4*X^4) 
 4km: 4.6km pixel size (8640x4320 image, 2.5 minute resolution)

We now decompress files and check metadata

# Go to where the data is and decompress
  find -iname '*.bz2' -exec bzip2 -d {} \; 
# Check file meta-data (GDAL utilities)
  gdalinfo A20030012003008.L3m_8D_CHL_chlor_a_4km

Next step is to import all 506 images into GRASS. You can use r.in.gdal or r.external for that. Note that global Cl-a images as downloaded from ocean color site are ~150 mb each (disk space issues!). Here, 3 different options:

  • import global images (506 images, 150 Mb each) with r.in.gdal, resize to study area and remove global files
  • set projection and extension, and resize to study area with gdal_translate, and import already resized images with r.in.gdal
  • import global images (506 images, 150 Mb each) with r.external, resize to study area and remove global files, as showed next:
# define region extension
g.region -p n=-38 s=-55 w=-70 e=-56

suffix=_tmp
for map in *chlor*; do
  r.external input=$map output=${map}${suffix} -o ;
  r.mapcalc expression="$map=${map}${suffix}" ;
  g.remove rast=${map}${suffix} ;
done

Time series processing

Once we have maps inside GRASS we can start the temporal processing. If this is the first time you'll use temporal modules, you need to run

  t.connect -d

to set the default temporal GIS database connection for the current mapset. The default TGIS database of type sqlite3 is located in the PERMANENT mapset directory. Temporal GIS content from all created mapsets will be stored there.

1. Creating a STRDS and registering maps

First step is to create a space time dataset by means of t.create. Let us create a strds for the Chlorophyll-a (Cl-a) time series. We need to define the type (raster, 3D raster or vector), if the time is absolute or relative and, the name of the space time dataset.

t.create type=strds temporaltype=absolute output=cla title="Chlorophyll-a concentration" \
  description="MODIS L3 Chlorophyll-a concentration for Argentinian sea"

Then, we register our 506 raster maps in the strds using t.register. This module assigns time stamps to raster, 3D raster and vector maps and register them into space time datasets. Existing timestamps can be read and used by t.register.

This module supports absolute and relative time. Maps can be registered by command line argument (a list of comma separated map names) or using an input file. The start time, end time and a temporal increment can be provided by command line or in the input file. End time and increment are mutual exclusive. Maps can be registered in several space time datasets using the same timestamp.

Start time and end time with absolute time must be provided using the format yyyy-mm-dd HH:MM:SS +HHMM. It is also supported to specify the date yyyy-mm-dd only. In case of relative time the temporal unit (years, months, days, hours, minutes or seconds) must be provided. The relative start time, end time and the increment are integers.

t.register -i type=rast input=cla_test maps=`g.mlist rast pat=*_chlor_* sep=,`\
  start="2003-01-01" increment="8 days"

This would have been the simplest solution, but 8-day products have a problem, the last image of each year is not a product of an 8-day composition, but 4 or 5-day. Then, when using the increment parameter, dates are not set properly. The solution was to create a list of maps, with their respective start and end date. As the filenames contain information regarding year and DOY (day of year), we can use the following Python script to read filenames and transform DOY to calendar dates (Thanks Soeren!).

# in python
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk("/path/to/the/maps"):
    f.extend(filenames)
    break
# to order the list    
f.sort() 
print f

import datetime
for map_name in f:
  start_year = int(map_name[1:5])
  start_day  = int(map_name[5:8])
  end_year   = int(map_name[8:12])
  end_day    = int(map_name[12:15])
  start = datetime.datetime(start_year, 1, 1) + datetime.timedelta(start_day - 1)
  end = datetime.datetime(int(end_year), 1, 1) + datetime.timedelta(end_day)
  print map_name + '|' + str(start) + '|' + str(end)

Using the number of characters in the filenames and datetime library in Python, you can convert DOY in the filenames into start_time and end_time as in the list you need to pass to t.register. The resulting list looks like this:

 A20030012003008.L3m_8D_CHL_chlor_a_4km_arg|2003-01-01 00:00:00|2003-01-09 00:00:00
 A20030092003016.L3m_8D_CHL_chlor_a_4km_arg|2003-01-09 00:00:00|2003-01-17 00:00:00
 A20030172003024.L3m_8D_CHL_chlor_a_4km_arg|2003-01-17 00:00:00|2003-01-25 00:00:00
 ...
 A20133452013352.L3m_8D_CHL_chlor_a_4km_arg|2013-12-11 00:00:00|2013-12-19 00:00:00
 A20133532013360.L3m_8D_CHL_chlor_a_4km_arg|2013-12-19 00:00:00|2013-12-27 00:00:00
 A20133612013365.L3m_8D_CHL_chlor_a_4km_arg|2013-12-27 00:00:00|2014-01-01 00:00:00

and then, the command would be:

t.register --o type=rast input=cla file=map_list

We can also set a color palette for all maps in the strds with:

t.rast.colors input=cla color=name_of_color_table

2. Getting some basic info and statistics

We now check the space time data sets we have in our mapset with:

t.list

and list information about our recently created strds. See t.info for additional uses.

t.info type=strds input=cla
 +-------------------- Space Time Raster Dataset -----------------------------+
 |                                                                            |
 +-------------------- Basic information -------------------------------------+
 | Id: ........................ cla@clorofila
 | Name: ...................... cla
 | Mapset: .................... clorofila
 | Creator: ................... veroandreo
 | Temporal type: ............. absolute
 | Creation time: ............. 2014-04-29 14:23:00.579342
 | Modification time:.......... 2014-05-12 09:15:08.917309
 | Semantic type:.............. mean
 +-------------------- Absolute time -----------------------------------------+
 | Start time:................. 2003-01-01 00:00:00
 | End time:................... 2014-01-01 00:00:00
 | Granularity:................ 1 day
 | Temporal type of maps:...... interval
 +-------------------- Spatial extent ----------------------------------------+
 | North:...................... -38.0
 | South:...................... -55.0
 | East:.. .................... -55.0
 | West:....................... -70.0
 | Top:........................ 0.0
 | Bottom:..................... 0.0
 +-------------------- Metadata information ----------------------------------+
 | Raster register table:...... raster_map_register_91ba57d5f0924f4fa0bd7176a1b39b2f
 | North-South resolution min:. 0.041667
 | North-South resolution max:. 0.041667
 | East-west resolution min:... 0.041667
 | East-west resolution max:... 0.041667
 | Minimum value min:.......... 0.02925
 | Minimum value max:.......... 0.26472
 | Maximum value min:.......... 5.2104
 | Maximum value max:.......... 99.953934
 | Aggregation type:........... None
 | Number of registered maps:.. 506
 |
 | Title:
 | Chlorophyll-a
 | Description:
 | Concentracion de Clorofila a
 | Command history:
 | # 2014-04-29 14:23:00 
 | t.create type="strds" temporaltype="absolute"
 |     output="cla" title="Chlorophyll-a"
 |     description="Concentracion de Clorofila a" --o
 | # 2014-04-29 14:23:23 
 | t.register --o type="rast" input="cla"
 |     file="map_list"
 | 
 +----------------------------------------------------------------------------+

Now, we get univariate statistics from the non-null cells for each registered raster map of the strds. For that matter we use t.rast.univar (link) which, by default, returns the name of the map, the start and end date of dataset and the following values: mean, minimum and maximum vale, mean_of_abs, standard deviation, variance, coeff_var, number of null cells, total number of cell.

t.rast.univar cla

In Linux-based systems you can send the output to a text file using

t.rast.univar cla > stats_cla

3. Listing maps and selections

The module t.rast.list allows you to list registered maps of a strds and provides several options to achieve what you want. For example, you can select different granules and also perform different queries by means of the where parameter. Some examples are:

t.rast.list cla method=gran granule="1 month"
# this will give one image every one month, 3 months, 1 year, or whatever granule you choose

t.rast.list cla order=min columns=id,name,start_time,min where="min <= '0.05'" 
# this will order by minimum value all the maps in the strds that have a minimum value lower than or equal to 0.05

t.rast.list input=cla order=max columns=name,start_time,max where="max > '10.0'"
# maps ordered by maximum value in which maximum value is higher than 10.

t.rast.list input=cla where="start_time >= '2003-01' and start_time <= '2003-06'" 
# all the maps in the first 6 month of the time series

In the where parameter you can use sql datetime functions. Then, to get for example, all maps which start date is in January, we can do:

t.rast.list cla where="strftime('%m', start_time)='01'"

# or 

t.rast.list input=cla \
  where='start_time >= datetime(start_time, "start of year") and start_time <= datetime(start_time, "start of year", "1 month")'

If you have monthly (instead of 8-day products) data and you want to list all January maps, then you can do:

t.rast.list input=cla_orig where="start_time=datetime(start_time, 'start of year', ' 0 month')"

4. Visualization

There are different visualization options for strds.

  • g.gui.timeline allows to compare temporal datasets by displaying their temporal extents in a plot.
# only temporal extent
g.gui.timeline cla
# temporal and spatio-temporal extent
g.gui.timeline -3 cla
Temporal and spatio-temporal extent of cla strds.
  • g.gui.tplot allows to see the values of one or more temporal datasets for a queried point defined by a coordinate pair.

Steps to use this module are:

  1. Select strds
  2. Select pair of coordinates (east,north) or point in the map
  3. Hit Run
  4. Customize as desired
  5. Save
Time series plot (Chlorophyll vs Time) for a certain coordinate pair in the study area
  • g.gui.animation is the tool for animating a series of raster and vector maps or a space time raster or vector dataset.
g.gui.animation strds=cla

5. Aggregation

For aggregations of data with different methods and different granularities, there are two very useful commands:

  • t.rast.series that performs different aggregation algorithms from r.series on all or a subset of raster maps in a strds, and
  • t.rast.aggregate that temporally aggregates the maps in a strds by a user defined granularity.

With these modules it is very simple to get maps of basic statistical parameters for different temporal granules, and this permits the analysis of the spatio-temporal variability of the variable of interest.

Some examples:

# yearly aggregation
t.rast.aggregate input=cla output=cla_yearly_average \
  base=cla_yearly_average granularity="1 years" \
  method=average sampling=start

# yearly aggregation with corresponding methods (output: 7 strds with 11 maps each)
for method in average median mode minimum maximum stddev range ; 
do
t.rast.aggregate input=cla output=cla_yearly_${method} \
                 base=cla_yearly_${method} granularity="1 years" \
                 method=${method} sampling=start
done

With the where parameter, we can select all 8-day products which start_time is 01 (January) over the years. Like this we can get the so-known climatologies.

t.rast.series input=cla method=average where="strftime('%m', start_time)='01'" output=january_average

Generalizing a bit, we can:

# climatologies for every month 
for i in 01 02 03 04 05 06 07 08 09 10 11 12 ; do 
  for m in average median mode stddev range minimum maximum ; do 
    t.rast.series input=cla method=${m} where="strftime('%m', start_time)='${i}'" output=${m}_${i}
  done
done

Using climatologies previously obtained, we'll now estimate monthly anomalies in the mean, max and min Cl-a concentration. First, we need to monthly aggregate data, and then do the difference between the monthly climatology and each respective monthly aggregate. We'll do the aggregation for the average, minimum and maximum of Cl-a concentration (from 506 input maps in cla strds, we'll get 132 maps in each monthly aggregated new strds).

# monthly aggregates (132 maps)

for method in average minimum maximum ; do
t.rast.aggregate input=cla output=cla_monthly_${method} base=cla_monthly_${method} \
  granularity="1 months" method=${method} sampling=contains
done

# January anomalies in mean, min and max Cl-a

t.rast.list -s input=cla_monthly_average where="start_time=datetime(start_time, 'start of year', '0 month')" columns=name

for m in average minimum maximum ; do 
  for i in 1 13 25 37 49 61 73 85 97 109 121 ; do # these numbers correspond to all january monthly aggregates
    r.mapcalc expression="Jan_${m}_anomaly_${i}=01_${m}-cla_monthly_${m}_${i}" 
  done
done

You can also use t.rast.list for looping, the same way you use g.mlist:

for map in `t.rast.list -s input=cla_monthly_average where="start_time=datetime(start_time, 'start of year', '0 month')" columns=name`; 
do
r.mapcalc expression="anomaly_${map}=01_average-${map}" 
done

Say we now need to know the date of the maximum value of Cl-a concentration over all the study period and/or on a yearly basis:

# map index for the overall maximum Cl-a value 
t.rast.series input=cla method=max_raster output=cla_max_index

# map index for the yearly maximum Cl-a value 
t.rast.aggregate input=cla granularity="1 year" method=max_raster output=yearly_max_index basename=yearly_max_index

The outputs show (pixelwise) the map index in which the maximum value of Cl-a occurs (from 1 to 506 and from 1 to 46, for the whole time series and on a yearly basis, respectively). For relative time this is maybe enough, but you may then want to reclassify data to get DOY, for example. In that case, you may use r.reclass.

If you already have monthly data, you can get climatologies quite simply as follows:

# January averages
t.rast.series input=cla_monthly method=average where=start_time=datetime(start_time, 'start of year', '0 month') output=jan_average

6. TODO

(add some more)

FAQ

Aggregation with defined granularity

Q: I need to aggregate a strds with a granularity of 1 year, but shifting the start day one month in each run, i.e.: changing the start_time to 2003-02-01, 2003-03-01, 2003-04-01 and so on... My question is: if i recursively change start_time with the 'where' parameter, will the module t.rast.aggregate "aggregate" to the next february, march, april (what i'd wish) or just till the end of 2003?

A: If you specify a granularity of a year, then the start time to perform the aggregation will always be shifted to the 1st January of the current year and the end time the 1st January of the next year (eg. 2002-01-01 - 2003-01-01).  If you wish to aggregate a full year but shifting one month forward then simply use a granularity of 12 months.

Generating time series input (FIND BETTER TITLE)

TODO: reformulate the question a bit

Q: I have a strds with 506 maps that correspond to 8-day composite products (11 years). I want to get the list of maps where "start_month" is January, February and so on... Now I want to use them as input in r.series (or t.rast.series), but how? Note that I do not want to aggregate maps by month, since I need to use the original maps belonging to each month. Is there a way to achieve that?

A: You can use the datetime functionality of SQLite to perform this task, this should work for January:

t.rast.list input=cla_null_mayor65 \
  where='start_time >= datetime(start_time, "start of year") and
  start_time <= datetime(start_time, "start of year", "1 month")'

Expert tricks

BE CAREFUL. THIS IS NOT RECOMMENDED TO NEW USERS.

Creating a TGRASS DB with data from a different mapset

TGRASS is designed to only work with data present in the current mapset. An expert user may override this in order to register data from another mapset in his/her TGRASS database.

To achieve this, the following two internal variables must be set:

g.gisenv set="TGIS_DISABLE_MAPSET_CHECK=True"
g.gisenv set="TGIS_DISABLE_TIMESTAMP_WRITE=True"

in order to disable the mapset check and the writing of the timestamps of each map to the map metadata in the spatial database as text files. These variables can be set mapset specific.

Settings these variables "True" should (hopefully, because yet partially untested) allow the registration of maps outside the current mapset, even if you do not have the permission to modify the maps.

A warning will be printed if these variables are set True.

BUT, be aware that this feature can lead to the corruption of the temporal database and unwanted side effects. You can mess up the temporal database if you are not 100% sure what you are doing. It is no longer possible to access the timestamp information of these maps using the C-libraries, because the timestamp information is not available in the map metadata text files.

More expert tricks

Perhaps to be added...

References

See also