Store time series data to big matrix

technocrat · March 10, 2021, 7:16am

I've been lapped it appears. Here's as far as I got. Not very helpful, I fear.

#ts.moist1 is a matrix that contains 19-year moisture data at each geographic point (dimensions 705184 by 240)
#ts.moist2 is a matrix that contains 19-year moisture data at each geographic point, but different to ts.moist2 (dimensions 705184 by 240)
#ts.srad is a matrix that contains 19-year solar radiation data at each geographic point (dimensions 705184 by 240)
#ts.tair is a matrix that contains 19-year air temperature data at each geographic point (dimensions 705184 by 240)
#ts.tsoil is a matrix that19 contains -year soil temperature data at each geographic point (dimensions 705184 by 240)

Wow, that is a lot of data. I'm going to assume that by "store" is meant a way to gather this all into a single object for later extraction and processing.

I was going to start out by suggesting handling the spatial data as a simple features, sf data frame, but I see that sf does not fully work with raster. So, I'll start with trying better to understand the desired output as an abstract object.

A matrix is an object in two-dimensional (2D) space where all of the elements must be either typeof numeric or typeof character, but not a mixture. An array is an analogous object in 2D or higher space. So, to use a matrix, we must be able to find some way stuff in all of the data dimensions or abandon the matrix class for the array class to allow higher dimensionality.

Let's start with an empty matrix

m <- matrix(nrow = 705184, ncol = 240)

populates a matrix populated by NAs. The moisture, radiation temperature data at hand have populated matrices of the same dimension and another matrix of dim 705184, 457 represents geographic coordinates

g <- matrix(nrow = 705184, ncol = 457)

I'll make explicit an assumption that all \forall m \in g. If m are "left justified$ within g (that is, no overlap with g[,241:457], g could be trimmed. If not, we might need something like this pcode

for (i in g) is.some.m()

to align the spatial points with the other data.

Once we have the correspondence between each m and its location in g, we need to work in the temporal aspect. At it's simplest, a time series is a numeric vector, v, with some additional attributes.

v <- c(16, 15, 23, 11, 25, 23, 20, 26, 12, 23, 22, 14, 13, 21, 29, 23, 16, 20, 18)

where the values are, say, C\deg temperatures.

So, for each element in g we have zero or more v related to an m. This raises two problems: two $v$s cannot occupy the same position in g and v isn't a single numeric value so it won't "fit" in g unless we resort to an index number that points to an external v. This appears to leave the choice of creating adjacent matrices for each m and to abandon the attempt to populate g in favor of an array, a based on g and treating m and v as dimensions in higher-N space.

Either of these approaches strikes me as inconvenient for purposes of accessing observations. A discrete observation would be in the form g[69872,103,2,3] or some such for an array and even uglier for side-by-side matrices.

All of which brings me round to ask the motivation for storage of the data otherwise than in a RasterStack or RasterBrick object?

Sorry I don't have more to offer to this interesting question.

Edward · March 10, 2021, 5:01pm

The data were stored in a rasterbrick then used to extract the values for each spatial point over time. Your comment is valuable though. Thanks

system · March 31, 2021, 5:01pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.