Interpolate a value from dataframe based on two inputs

I have a data frame that looks like this:

   Teff  logg M_div_H       U       B      V      R      I     J     H     K     L Lprime     M
 1: 2000  4.00    -0.1 -13.443 -11.390 -7.895 -4.464 -1.831 1.666 3.511 2.701 4.345  4.765 5.680
 2: 2000  4.50    -0.1 -13.402 -11.416 -7.896 -4.454 -1.794 1.664 3.503 2.728 4.352  4.772 5.687
 3: 2000  5.00    -0.1 -13.358 -11.428 -7.888 -4.431 -1.738 1.664 3.488 2.753 4.361  4.779 5.685
 4: 2000  5.50    -0.1 -13.220 -11.079 -7.377 -4.136 -1.483 1.656 3.418 2.759 4.355  4.753 5.638
 5: 2200  3.50    -0.1 -11.866  -9.557 -6.378 -3.612 -1.185 1.892 3.294 2.608 3.929  4.289 4.842
 6: 2200  4.50    -0.1 -11.845  -9.643 -6.348 -3.589 -1.132 1.874 3.310 2.648 3.947  4.305 4.939
 7: 2200  5.50    -0.1 -11.655  -9.615 -6.279 -3.508 -0.997 1.886 3.279 2.709 3.964  4.314 4.928
 8: 2500 -1.02    -0.1  -7.410  -7.624 -6.204 -3.854 -1.533 1.884 3.320 2.873 3.598  3.964 5.579
 9: 2500 -0.70    -0.1  -7.008  -7.222 -5.818 -3.618 -1.338 1.905 3.266 2.868 3.502  3.877 5.417
10: 2500 -0.29    -0.1  -6.526  -6.740 -5.357 -3.421 -1.215 1.927 3.216 2.870 3.396  3.781 5.247
11: 2500  5.50    -0.1  -9.518  -7.575 -5.010 -2.756 -0.511 1.959 3.057 2.642 3.472  3.756 4.265
12: 2800 -1.02    -0.1  -7.479  -7.386 -5.941 -3.716 -1.432 1.824 3.259 2.812 3.567  3.784 5.333
13: 2800 -0.70    -0.1  -7.125  -7.032 -5.596 -3.477 -1.231 1.822 3.218 2.813 3.479  3.717 5.229
14: 2800 -0.29    -0.1  -6.673  -6.580 -5.154 -3.166 -0.974 1.816 3.163 2.812 3.364  3.628 5.093
15: 2800  3.50    -0.1  -8.113  -6.258 -4.103 -2.209 -0.360 1.957 2.872 2.517 3.219  3.427 4.026
16: 2800  4.00    -0.1  -7.992  -6.099 -3.937 -2.076 -0.230 1.907 2.869 2.480 3.227  3.424 4.075
17: 2800  4.50    -0.1  -7.815  -6.051 -4.067 -2.176 -0.228 1.920 2.877 2.503 3.212  3.428 4.000
18: 2800  5.00    -0.1  -7.746  -6.018 -4.031 -2.144 -0.176 1.907 2.883 2.512 3.216  3.430 4.023
19: 3000 -0.70    -0.1  -7.396  -6.995 -5.605 -3.554 -1.293 1.787 3.172 2.759 3.474  3.588 5.052
20: 3000 -0.29    -0.1  -6.966  -6.565 -5.179 -3.249 -1.035 1.772 3.136 2.764 3.388  3.533 4.978

Here is the link to the entire data frame: https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=0
Notice, for example, how every V value has a unique Teff, logg combination. We can think of all the (Teff, logg) combinations as grid points.

Now, let's say I have two values that make up an input point:

input_Teff = 2300
input_log_g = 3.86

From the input values, I would like to interpolate a value for V, such that when I plot V as a function of some independent variable, V is continuous instead of a series of discrete values. Is there a way to do this in R?

It seems you want to fit V values with repect to Teff and logg. Here is one way to do that with the loess function. Besides doing the fit, I drew a contour plot using the predicted fit and plotted a few points from the original data on it. You can see that the fit seems to match the original data reasonably well. For example the red point with a V of -0.596 falls between the contours with values of 0 and -1.

DF <- read.csv("~/R/Play/lcb98cor.dat", sep = " ")
#Fit the data
FIT <- loess(V ~ Teff + logg, data = DF)

#Define the new point and predict its V value
NewPt <- data.frame(Teff = 2300, logg = 3.86)
NewPt$Pred <- predict(object = FIT, newdata = NewPt)
NewPt
#>   Teff logg     Pred
#> 1 2300 3.86 2.892414

#Make a grid of points to use in a contour plot
GRID <- expand.grid(Teff = seq(2000, 50000, 2000), logg = seq(-1, 5.5, 0.2))
#Predict the V values on the grid
POINTS <- predict(FIT, GRID)

#Pull some random samples from the original data to plot on the contour
Samples <- DF[c(460,6653,7151, 5505, 4379, 448),]

contour(x = seq(2000, 50000, 2000), y = seq(-1, 5.5, 0.2), z = POINTS)
#plot the V values of the sampled points
text(x = Samples$Teff, y = Samples$logg, labels = Samples$V, pos = 1, cex = 0.7)
#Plot the sampled pointes
points(x = Samples$Teff, y = Samples$logg, col = "red")

Created on 2021-08-05 by the reprex package (v0.3.0)

@FJCC This is a huge help. Thanks!

Is LOESS supposed to take long? May I ask why you chose LOESS here?

I chose the loess() function because it does a local fit and I didn't know how smoothly or linearly V varies. You could also use a linear model in much the same way with the lm() function. You can include an interaction term or variables raised to a power. There are many ways to do such regressions but without understanding the data, it doesn't make sense for me to do anything elaborate.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.