Problems in predicting point data from gridded data

The original post is here, but it has been closed.

If the gridded DFgrid dataset is like this:

longitude latitude elevation precip temp
1 44.00 -64.00 0.00 1.1 12
2 44.25 -64.00 0.25 1.5 12
3 44.50 -64.00 0.50 1 13
4 44.75 -64.00 0.75 1.2 14
5 45.00 -64.00 1.00 0.1 12
6 44.00 -63.75 0.25 0.1 12
7 44.25 -63.75 0.50 1.3 12
8 44.50 -63.75 0.75 1.4 8
9 44.75 -63.75 1.00 1.4 9
10 45.00 -63.75 1.25 1 12
11 44.00 -63.50 0.50 1.8 9
12 44.25 -63.50 0.75 1 8
13 44.50 -63.50 1.00 0.5 10
14 44.75 -63.50 1.25 0.6 11
15 45.00 -63.50 1.50 0.7 11
16 44.00 -63.25 0.75 1 10
17 44.25 -63.25 1.00 1 10
18 44.50 -63.25 1.25 1 6
19 44.75 -63.25 1.50 1.1 7
20 45.00 -63.25 1.75 1.2 0
21 44.00 -63.00 1.00 1 1
22 44.25 -63.00 1.25 1 0
23 44.50 -63.00 1.50 1.6 1
24 44.75 -63.00 1.75 1.6 1
25 45.00 -63.00 2.00 1 2

And if I know elevations at new_points (which change more abruptly than DFgrid), how to predict precip and temp at new_points?
For example, the elevations in DFgrid represent the average elevation of each grid cell. In new_points, the elevations at each station point may be higher or lower than the average elevation at that grid cell.
If new_points looks like this, how to predict precipitation and temperature from DFgrid to new_points, and consider the elevation effects (specifically, lapse rate for temperature and orographic effect for precipitation? Thanks for your help.

new_points <- tibble(
longitude = c(44.1, 44.9),
latitude = c(-63.9, -63.1),
elevation = c(-5, 10)
)

Another question is that what is the difference between dataframe and tibble? Thanks very much.

I cannot provide much guidance besides noting that the elevation data you have span the range 0 to 2 and the new_points have elevations of -5 and 10. I think you would need a strong theoretical basis to justify extrapolating a fit to that extent. Knowing nothing about this particular field, I cannot say how justifiable an extrapolation would be.

Yes, this is the problem. The elevation data from DFgrid represent the average elevation of each grid cell and thus has been smoothed. In my case, each grid cell is 0.5 degree latitude and 0.5 degree longitude. The elevation data in new_points represent elevation of each station point. Thus, the elevation in new_points can be higher or lower than in DFgrid.

The function raster::extract works. But the extracted values in specific columns (i.e., columns 3 to 5) have 6 decimal places, how to limit them to 2 decimal places? I tried

format(new_points[,3:5],nsmall=2)

but it does not work. Thanks.

Please post your full code.

The sample data and code are shown below, but in my case, the precip and temp values can have many decimal places. Is it possible to control the decimal places to 2? Thanks.

DFgrid = data.frame(longitude = c(44.00, 44.50, 45.00, 44.00, 44.50, 45.00, 44.00, 44.50, 45.00),
latitude = c(-64.00, -64.00, -64.00, -63.50, -63.50, -63.50, -63.00, -63.00, -63.00),
elevation = c( 0.0, 0.5, 1.0, 0.5, 1.0, 1.5, 1.0, 1.5, 2.0),
precip = c(1.001000,1.502222,1,1.211000,0.165556,0.100122,1.330709,1.4,1.890990),
temp = c(12.123445,8.210112,9.155560,10.1,11,6.556890,2.2,4.112335,3))

create a raster object using all the attributes (X and Y first)
rast_obj <- DFgrid %>%
raster::rasterFromXYZ()

create an X, Y data frame
new_points <- tibble(
longitude = c(44.1, 44.9),
latitude = c(-63.9, -63.1),
elevation = c(-5, 10)
)

predict values at the point locations
pnt.pred = bind_cols(
new_points[,1:2],
as_tibble(raster::extract(rast_obj, new_points[,1:2]))
)

I use the code below for specific columns and want to export the new dataset as a table, but it doesn't work.

pnt.pred2= format(pnt.pred[,4:5],nsmall=1)

You can use mutate_at() from dplyr with round().

library(tibble)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

DFgrid = data.frame(longitude = c(44.00, 44.50, 45.00, 44.00, 44.50, 45.00, 44.00, 44.50, 45.00),
                    latitude = c(-64.00, -64.00, -64.00, -63.50, -63.50, -63.50, -63.00, -63.00, -63.00),
                    elevation = c( 0.0, 0.5, 1.0, 0.5, 1.0, 1.5, 1.0, 1.5, 2.0),
                    precip = c(1.001000,1.502222,1,1.211000,0.165556,0.100122,1.330709,1.4,1.890990),
                    temp = c(12.123445,8.210112,9.155560,10.1,11,6.556890,2.2,4.112335,3))

#create a raster object using all the attributes (X and Y first)
rast_obj <- DFgrid %>%
  raster::rasterFromXYZ()

#create an X, Y data frame
new_points <- tibble(
  longitude = c(44.1, 44.9),
  latitude = c(-63.9, -63.1),
  elevation = c(-5, 10)
)

#predict values at the point locations
pnt.pred = bind_cols(
  new_points[,1:2],
  as_tibble(raster::extract(rast_obj, new_points[,1:2]))
)

pnt.pred <- pnt.pred %>% mutate_at(c(4,5), round, digits = 2)

Created on 2019-05-07 by the reprex package (v0.2.1)

Thanks, the code works. But I'd like to ask that are there only two interpolation methods when using extract function, simple and bilinear? Is it possible to try other interpolation methods such as nearest neighbor when predicting point values?

In my data, there are elevation values in new_points, but the predicted values using "extract" are quite different from the observed values in new_points. I'm wondering if there are other interpolation methods to choose? The grid cell size in DFgrid are large, so that the average elevation in each grid cell may not represent the elevation at a specific location in new_points. Thanks again.

I assume the default method is "simple". I just checked the function extract:

If 'simple' values for the cell a point falls in are returned. If 'bilinear' the returned values are interpolated from the values of the four nearest raster cells.

So does that mean that the values for the cell centroid, rather than the point location is returned? How to do nearest neighbor when following this approach?

Here is a small part of my data:
DFgrid represents the equal sized grid cells, new_points represents the stations that I want to predict the precip data from the surrounding grid cells. The actual elevation at this point is -21.1 m, but the predicted elevation from the surround grid cells is 2188 m. I think it influences the predicted precip value. What is the mechanism in "simple interpolation" here? Thanks.

DFgrid = data.frame(longitude = c(48.6250, 48.3750, 48.6250, 48.3750),
latitude = c(38.1250, 38.1250, 38.3750, 38.3750),
elevation = c(2188,1413,795,1307),
precip = c(10,15.4,1,1.211))

rast_obj <- DFgrid %>%
raster::rasterFromXYZ()

new_points <- tibble(longitude = 48.51,
latitude = 38.22,
elevation = -21.1)

pnt.pred = bind_cols(
new_points[,1:2],
as_tibble(raster::extract(rast_obj, new_points[,1:2]))
)

I suspect that extract() will not get you what you want. The simple method just returns the assigned value for the cell that contains the point. The bilinear method does interpolate based on the neighboring cells but it will use the mean cell values. Since your elevations seems to vary a lot within each cell, the elevation used for the interpolation may be far from the actual elevation.

My very tentative speculation is that you need to construct your own model of the variation of precipitation and temperature with longitude, latitude and elevation. IF you have data for actual precipitation and temperature across your whole range of elevations, it may be adequate to use the lm() function. You would have to determine whether a model with just elevation, latitude and longitude is sufficient or whether other terms such as elevation^2 or interaction are needed. If you do not have precipitation and temperature for the full range of elevation, then you need to have some basis from which to build extrapolations.

Keep in mind that I have not seen your data and I have no experience with data like these. I am just speaking from general experience.

Thanks for your explanation. So the bilinear method does not use distance weighting or something similar rather than the average of the four grid cells? I don't know how to write such a script. My data has the regularly spaced grid cells in DFgrid, and irregularly and sparsely spaced station point data as new_points. There are 12 monthly values in DFgrid, and I want to predict 12 monthly values from DFgrid to new_points.

Is it possible to do trilinear interpolation using the similar method? I already formatted my dataset this way. Thanks for any help.

Yes, the lm() function can be used with three predictor variables. How to do such a regression well is a big topic. I suggest this text up through Chapter 3, though there are many good sources.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.