Temporal data processing: Difference between revisions
Veroandreo (talk | contribs) (new section added) |
Veroandreo (talk | contribs) (more examples) |
||
Line 411: | Line 411: | ||
t.remove -rf date_max_cla | t.remove -rf date_max_cla | ||
</source> | </source> | ||
In the development version of GRASS GIS, there's also a [http://grass.osgeo.org/grass71/manuals/t.rast.mapcalc2.html t.rast.mapcalc2] module that allows for temporal and spatial operations on strds by means of temporal raster algebra. The module expects an expression as input parameter in the following form: | |||
'''"result = expression"''' | |||
The statement structure is similar to r.mapcalc, the result is the name of a new strds that will contain the result of the calculation given as expression. Expressions can be any valid or nested combination of temporal operations and spatial overlay or buffer functions that are provided by the temporal algebra. See the manual for further details and explanations. | |||
We'll use this module to estimate the rate of change (slope) between every pair of maps in the "cla" strds. The result will be a new strds consisting of maps with the slope value between every consecutive pair of maps in the original strds | |||
<source lang="bash"> | |||
t.rast.mapcalc2 expression="slope_cla = (cla[1]-cla[0])/8.0" basename=slope_cla | |||
# we set 8 as fixed denominator, because products are 8-day compositions | |||
</source> | |||
we can then use any of the aggregation modules that we saw before to get the maximum or minimumm rate of change for different granularities. | |||
=== 7. TODO === | === 7. TODO === |
Revision as of 15:42, 6 September 2014
Introduction
TGRASS is the temporal enabled GRASS GIS. It is available from GRASS GIS 7 onwards. TGRASS is completely metadata based, and managing temporal and spatial extent including temporal topology.
Terminology Overview
- Space time raster datasets (strds) are designed to manage raster map time series. Modules that process strds have the naming prefix t.rast.
- Space time 3D raster datasets (str3ds) are designed to manage 3D raster map time series. Modules that process str3ds have the naming prefix t.rast3d.
- Space time vector datasets (stvds) are designed to manage vector map time series. Modules that process stvds have the naming prefix t.vect.
Example workflow for a Chlorophyll-a MODIS time series
The following examples are based on a series of MODIS L3 Chlorophyll-a product that is freely available at the ocean color site. So, say we download the SMI 8-day composite product at 4.6 km resolution for the period 2003-2013. That is a 506 set of images, 46 per year. Data comes as compressed HDF4 files. Chlorophyll products filenames look like this:
A20030012003008.L3m_8D_CHL_chlor_a_4km
A: MODIS/Aqua 2003: Year at start 001: Julian day at start 2003: Year at end 008: Julian day at end L3m: Level 3 data, mapped (Projection: Plate carrée) 8D: 8 day composition CHL: Chlorophyll a concentration product chlor_a: algorithm used 10^(a0 + a1*X + a2*X^2 + a3*X^3 + a4*X^4) 4km: 4.6km pixel size (8640x4320 image, 2.5 minute resolution)
We now decompress files and check metadata
# Go to where the data is and decompress
find -iname '*.bz2' -exec bzip2 -d {} \;
# Check file meta-data (GDAL utilities)
gdalinfo A20030012003008.L3m_8D_CHL_chlor_a_4km
Next step is to import all 506 images into GRASS. You can use r.in.gdal or r.external for that. Note that global Cl-a images as downloaded from ocean color site are ~150 mb each (disk space issues!). Here, 3 different options:
- import global images (506 images, 150 Mb each) with r.in.gdal, resize to study area and remove global files
- set projection and extension, and resize to study area with gdal_translate, and import already resized images with r.in.gdal
- import global images (506 images, 150 Mb each) with r.external, resize to study area and remove global files, as showed next:
# define region extension
g.region -p n=-38 s=-55 w=-70 e=-56
suffix=_tmp
for map in *chlor*; do
r.external input=$map output=${map}${suffix} -o ;
r.mapcalc expression="$map=${map}${suffix}" ;
g.remove rast=${map}${suffix} ;
done
Time series processing
Once we have maps inside GRASS we can start the temporal processing. If this is the first time you'll use temporal modules, you need to run
t.connect -d
to set the default temporal GIS database connection for the current mapset. The default TGIS database of type sqlite3 is located in the PERMANENT mapset directory. Temporal GIS content from all created mapsets will be stored there.
1. Creating a STRDS and registering maps
First step is to create a space time dataset by means of t.create. Let us create a strds for the Chlorophyll-a (Cl-a) time series. We need to define the type (raster, 3D raster or vector), if the time is absolute or relative and, the name of the space time dataset.
t.create type=strds temporaltype=absolute output=cla title="Chlorophyll-a concentration" \
description="MODIS L3 Chlorophyll-a concentration for Argentinian sea"
Then, we register our 506 raster maps in the strds using t.register. This module assigns time stamps to raster, 3D raster and vector maps and register them into space time datasets. Existing timestamps can be read and used by t.register.
This module supports absolute and relative time. Maps can be registered by command line argument (a list of comma separated map names) or using an input file. The start time, end time and a temporal increment can be provided by command line or in the input file. End time and increment are mutual exclusive. Maps can be registered in several space time datasets using the same timestamp.
Start time and end time with absolute time must be provided using the format yyyy-mm-dd HH:MM:SS +HHMM. It is also supported to specify the date yyyy-mm-dd only. In case of relative time the temporal unit (years, months, days, hours, minutes or seconds) must be provided. The relative start time, end time and the increment are integers.
t.register -i type=rast input=cla_test maps=`g.mlist rast pat=*_chlor_* sep=,`\
start="2003-01-01" increment="8 days"
This would have been the simplest solution, but 8-day products have a problem, the last image of each year is not a product of an 8-day composition, but 4 or 5-day. Then, when using the increment parameter, dates are not set properly. The solution was to create a list of maps, with their respective start and end date. As the filenames contain information regarding year and DOY (day of year), we can use the following Python script to read filenames and transform DOY to calendar dates (Thanks Soeren!).
# in python
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk("/path/to/the/maps"):
f.extend(filenames)
break
# to order the list
f.sort()
print f
import datetime
for map_name in f:
start_year = int(map_name[1:5])
start_day = int(map_name[5:8])
end_year = int(map_name[8:12])
end_day = int(map_name[12:15])
start = datetime.datetime(start_year, 1, 1) + datetime.timedelta(start_day - 1)
end = datetime.datetime(int(end_year), 1, 1) + datetime.timedelta(end_day)
print map_name + '|' + str(start) + '|' + str(end)
Using the number of characters in the filenames and datetime library in Python, you can convert DOY in the filenames into start_time and end_time as in the list you need to pass to t.register. The resulting list looks like this:
A20030012003008.L3m_8D_CHL_chlor_a_4km_arg|2003-01-01 00:00:00|2003-01-09 00:00:00 A20030092003016.L3m_8D_CHL_chlor_a_4km_arg|2003-01-09 00:00:00|2003-01-17 00:00:00 A20030172003024.L3m_8D_CHL_chlor_a_4km_arg|2003-01-17 00:00:00|2003-01-25 00:00:00 ... A20133452013352.L3m_8D_CHL_chlor_a_4km_arg|2013-12-11 00:00:00|2013-12-19 00:00:00 A20133532013360.L3m_8D_CHL_chlor_a_4km_arg|2013-12-19 00:00:00|2013-12-27 00:00:00 A20133612013365.L3m_8D_CHL_chlor_a_4km_arg|2013-12-27 00:00:00|2014-01-01 00:00:00
and then, the command would be:
t.register --o type=rast input=cla file=map_list
We can also set a color palette for all maps in the strds with:
t.rast.colors input=cla color=name_of_color_table
2. Getting some basic info and statistics
We now check the space time data sets we have in our mapset with:
t.list
and list information about our recently created strds. See t.info for additional uses.
t.info type=strds input=cla
+-------------------- Space Time Raster Dataset -----------------------------+ | | +-------------------- Basic information -------------------------------------+ | Id: ........................ cla@clorofila | Name: ...................... cla | Mapset: .................... clorofila | Creator: ................... veroandreo | Temporal type: ............. absolute | Creation time: ............. 2014-04-29 14:23:00.579342 | Modification time:.......... 2014-05-12 09:15:08.917309 | Semantic type:.............. mean +-------------------- Absolute time -----------------------------------------+ | Start time:................. 2003-01-01 00:00:00 | End time:................... 2014-01-01 00:00:00 | Granularity:................ 1 day | Temporal type of maps:...... interval +-------------------- Spatial extent ----------------------------------------+ | North:...................... -38.0 | South:...................... -55.0 | East:.. .................... -55.0 | West:....................... -70.0 | Top:........................ 0.0 | Bottom:..................... 0.0 +-------------------- Metadata information ----------------------------------+ | Raster register table:...... raster_map_register_91ba57d5f0924f4fa0bd7176a1b39b2f | North-South resolution min:. 0.041667 | North-South resolution max:. 0.041667 | East-west resolution min:... 0.041667 | East-west resolution max:... 0.041667 | Minimum value min:.......... 0.02925 | Minimum value max:.......... 0.26472 | Maximum value min:.......... 5.2104 | Maximum value max:.......... 99.953934 | Aggregation type:........... None | Number of registered maps:.. 506 | | Title: | Chlorophyll-a | Description: | Concentracion de Clorofila a | Command history: | # 2014-04-29 14:23:00 | t.create type="strds" temporaltype="absolute" | output="cla" title="Chlorophyll-a" | description="Concentracion de Clorofila a" --o | # 2014-04-29 14:23:23 | t.register --o type="rast" input="cla" | file="map_list" | +----------------------------------------------------------------------------+
Now, we get univariate statistics from the non-null cells for each registered raster map of the strds. For that matter we use t.rast.univar (link) which, by default, returns the name of the map, the start and end date of dataset and the following values: mean, minimum and maximum vale, mean_of_abs, standard deviation, variance, coeff_var, number of null cells, total number of cell.
t.rast.univar cla
In Linux-based systems you can send the output to a text file using
t.rast.univar cla > stats_cla
3. Listing maps and selections
The module t.rast.list allows you to list registered maps of a strds and provides several options to achieve what you want. For example, you can select different granules and also perform different queries by means of the where parameter. Some examples are:
t.rast.list cla method=gran granule="1 month"
# this will give one image every one month, 3 months, 1 year, or whatever granule you choose
t.rast.list cla order=min columns=id,name,start_time,min where="min <= '0.05'"
# this will order by minimum value all the maps in the strds that have a minimum value lower than or equal to 0.05
t.rast.list input=cla order=max columns=name,start_time,max where="max > '10.0'"
# maps ordered by maximum value in which maximum value is higher than 10.
t.rast.list input=cla where="start_time >= '2003-01' and start_time <= '2003-06'"
# all the maps in the first 6 month of the time series
In the where parameter you can use sql datetime functions. Then, to get for example, all maps which start date is in January, we can do:
t.rast.list cla where="strftime('%m', start_time)='01'"
# or
t.rast.list input=cla \
where='start_time >= datetime(start_time, "start of year") and start_time <= datetime(start_time, "start of year", "1 month")'
If you have monthly (instead of 8-day products) data and you want to list all January maps, then you can do:
t.rast.list input=cla_orig where="start_time=datetime(start_time, 'start of year', ' 0 month')"
4. Visualization
There are different visualization options for strds.
- g.gui.timeline allows to compare temporal datasets by displaying their temporal extents in a plot.
# only temporal extent
g.gui.timeline cla
# temporal and spatio-temporal extent
g.gui.timeline -3 cla
- g.gui.tplot allows to see the values of one or more temporal datasets for a queried point defined by a coordinate pair.
Steps to use this module are:
- Select strds
- Select pair of coordinates (east,north) or point in the map
- Hit Run
- Customize as desired
- Save
- g.gui.animation is the tool for animating a series of raster and vector maps or a space time raster or vector dataset.
g.gui.animation strds=cla
5. Aggregation
For aggregations of data with different methods and different granularities, there are two very useful commands:
- t.rast.series that performs different aggregation algorithms from r.series on all or a subset of raster maps in a strds, and
- t.rast.aggregate that temporally aggregates the maps in a strds by a user defined granularity.
With these modules it is very simple to get maps of basic statistical parameters for different temporal granules, and this permits the analysis of the spatio-temporal variability of the variable of interest.
Some examples:
# yearly aggregation
t.rast.aggregate input=cla output=cla_yearly_average \
base=cla_yearly_average granularity="1 years" \
method=average sampling=start
# yearly aggregation with corresponding methods (output: 7 strds with 11 maps each)
for method in average median mode minimum maximum stddev range ;
do
t.rast.aggregate input=cla output=cla_yearly_${method} \
base=cla_yearly_${method} granularity="1 years" \
method=${method} sampling=start
done
With the where parameter, we can select all 8-day products which start_time is 01 (January) over the years. Like this we can get the so-known climatologies.
t.rast.series input=cla method=average where="strftime('%m', start_time)='01'" output=january_average
Generalizing a bit, we can:
# climatologies for every month
for i in 01 02 03 04 05 06 07 08 09 10 11 12 ; do
for m in average median mode stddev range minimum maximum ; do
t.rast.series input=cla method=${m} where="strftime('%m', start_time)='${i}'" output=${m}_${i}
done
done
Using climatologies previously obtained, we'll now estimate monthly anomalies in the mean, max and min Cl-a concentration. First, we need to monthly aggregate data, and then do the difference between the monthly climatology and each respective monthly aggregate. We'll do the aggregation for the average, minimum and maximum of Cl-a concentration (from 506 input maps in cla strds, we'll get 132 maps in each monthly aggregated new strds).
# monthly aggregates (132 maps)
for method in average minimum maximum ; do
t.rast.aggregate input=cla output=cla_monthly_${method} base=cla_monthly_${method} \
granularity="1 months" method=${method} sampling=contains
done
# January anomalies in mean, min and max Cl-a
t.rast.list -s input=cla_monthly_average where="start_time=datetime(start_time, 'start of year', '0 month')" columns=name
for m in average minimum maximum ; do
for i in 1 13 25 37 49 61 73 85 97 109 121 ; do # these numbers correspond to all january monthly aggregates
r.mapcalc expression="Jan_${m}_anomaly_${i}=01_${m}-cla_monthly_${m}_${i}"
done
done
You can also use t.rast.list for looping, the same way you use g.mlist:
for map in `t.rast.list -s input=cla_monthly_average where="start_time=datetime(start_time, 'start of year', '0 month')" columns=name`;
do
r.mapcalc expression="anomaly_${map}=01_average-${map}"
done
Say we now need to know the date of the maximum value of Cl-a concentration over all the study period and/or on a yearly basis:
# map index for the overall maximum Cl-a value
t.rast.series input=cla method=max_raster output=cla_max_index
# map index for the yearly maximum Cl-a value
t.rast.aggregate input=cla granularity="1 year" method=max_raster output=yearly_max_index basename=yearly_max_index
The outputs show (pixelwise) the map index in which the maximum value of Cl-a occurs (from 1 to 506 and from 1 to 46, for the whole time series and on a yearly basis, respectively). For relative time this is maybe enough, but you may then want to reclassify data to get DOY, for example. In that case, you may use r.reclass.
If you already have monthly data, you can get climatologies quite simply as follows:
# January averages
t.rast.series input=cla_monthly method=average where=start_time=datetime(start_time, 'start of year', '0 month') output=jan_average
6. Spatio-temporal algebra with STRDS
The module t.rast.mapcalc allows us to perform spatio-temporal mapcalc expressions on temporally sampled maps of strds. There are spatial and temporal operators available for the "expression" string. Spatial operators, functions and internal variables are those used in r.mapcalc. Temporal internal variables supported for both relative and absolute time include: td(), start_time() and end_time(). There are also several very useful internal variables supported especially for absolute time of the current sample interval or instance, e.g.: start_doy(), start_year(), start_month() and so on (see t.rast.mapcalc manual site for further details and examples).
Some examples now. Say we did some analysis and decided that we will only consider values higher than 0.05. Then, we need to set all values below that threshold to null.
t.rast.mapcalc input=cla expression="if(cla < 0.05, null(), cla)" output=cla_corrected basename=cla_corrected
or we may also want to take negative erroneous values to the knowm minimum of the strds, so:
t.rast.series input=cla method=minimum output=min_cla
t.rast.mapcalc input=cla expression="if(cla < 0.0, min_cla, cla)" output=cla_corrected basename=cla_corrected
We may also need to reclassify all maps in the strds according to a certain threshold, e.g.: a certain level of Cl-a that indicates bloom conditions, in order to get bloom frequency afterwards:
# reclassify
t.rast.mapcalc -n input=cla output=cla_bloom basename=cla_bloom expression="if(cla > 0.75, 1, null())"
# bloom frequency
t.rast.series input=cla_bloom output=bloom_freq method=count
Do you remember we wanted to get the DOY of maximum Cl-a value before? Well, here's another way of doing it...
# overall maximum value
t.rast.series input=cla method=maximum output=max_cla
# new strds with DOY of overall maximum
t.rast.mapcalc -n inputs=cla output=date_max_cla expression="if(cla == max_cla,start_doy(),null())" basename=date_max_cla
# map with DOY of overall maximum
t.rast.series input=date_max_cla method=maximum output=max_cla_date
# remove date_max_cla strds (we were only interested in the resulting aggregated map)
t.remove -rf date_max_cla
In the development version of GRASS GIS, there's also a t.rast.mapcalc2 module that allows for temporal and spatial operations on strds by means of temporal raster algebra. The module expects an expression as input parameter in the following form:
"result = expression"
The statement structure is similar to r.mapcalc, the result is the name of a new strds that will contain the result of the calculation given as expression. Expressions can be any valid or nested combination of temporal operations and spatial overlay or buffer functions that are provided by the temporal algebra. See the manual for further details and explanations.
We'll use this module to estimate the rate of change (slope) between every pair of maps in the "cla" strds. The result will be a new strds consisting of maps with the slope value between every consecutive pair of maps in the original strds
t.rast.mapcalc2 expression="slope_cla = (cla[1]-cla[0])/8.0" basename=slope_cla
# we set 8 as fixed denominator, because products are 8-day compositions
we can then use any of the aggregation modules that we saw before to get the maximum or minimumm rate of change for different granularities.
7. TODO
(add some more)
FAQ
Aggregation with defined granularity
Q: I need to aggregate a strds with a granularity of 1 year, but shifting the start day one month in each run, i.e.: changing the start_time to 2003-02-01, 2003-03-01, 2003-04-01 and so on... My question is: if i recursively change start_time with the 'where' parameter, will the module t.rast.aggregate "aggregate" to the next february, march, april (what i'd wish) or just till the end of 2003?
A: If you specify a granularity of a year, then the start time to perform the aggregation will always be shifted to the 1st January of the current year and the end time the 1st January of the next year (eg. 2002-01-01 - 2003-01-01). If you wish to aggregate a full year but shifting one month forward then simply use a granularity of 12 months.
Generating time series input (FIND BETTER TITLE)
TODO: reformulate the question a bit
Q: I have a strds with 506 maps that correspond to 8-day composite products (11 years). I want to get the list of maps where "start_month" is January, February and so on... Now I want to use them as input in r.series (or t.rast.series), but how? Note that I do not want to aggregate maps by month, since I need to use the original maps belonging to each month. Is there a way to achieve that?
A: You can use the datetime functionality of SQLite to perform this task, this should work for January:
t.rast.list input=cla_null_mayor65 \
where='start_time >= datetime(start_time, "start of year") and
start_time <= datetime(start_time, "start of year", "1 month")'
Expert tricks
BE CAREFUL. THIS IS NOT RECOMMENDED TO NEW USERS.
Creating a TGRASS DB with data from a different mapset
TGRASS is designed to only work with data present in the current mapset. An expert user may override this in order to register data from another mapset in his/her TGRASS database.
To achieve this, the following two internal variables must be set:
g.gisenv set="TGIS_DISABLE_MAPSET_CHECK=True"
g.gisenv set="TGIS_DISABLE_TIMESTAMP_WRITE=True"
in order to disable the mapset check and the writing of the timestamps of each map to the map metadata in the spatial database as text files. These variables can be set mapset specific.
Settings these variables "True" should (hopefully, because yet partially untested) allow the registration of maps outside the current mapset, even if you do not have the permission to modify the maps.
A warning will be printed if these variables are set True.
BUT, be aware that this feature can lead to the corruption of the temporal database and unwanted side effects. You can mess up the temporal database if you are not 100% sure what you are doing. It is no longer possible to access the timestamp information of these maps using the C-libraries, because the timestamp information is not available in the map metadata text files.
More expert tricks
Perhaps to be added...
References
- Gebbert, S., Pebesma, E., 2014. TGRASS: A temporal GIS for field based environmental modeling. Environmental Modelling & Software 53, 1-12 (http://dx.doi.org/10.1016/j.envsoft.2013.11.001 DOI])
See also
- Introduction: GRASS as a Temporal GIS by Sören Gebbert] (slides)
- Introduction to temporal modules (GRASS GIS manual)
- Available temporal modules (GRASS GIS manual)
- Temporal Extension (background info in trac)
- GRASS GSoC 2013 Temporal GIS Algebra for raster and vector data in GRASS