I'm not sure if there is a specific name for it, but this is what I would like to do (where y is just one row of data):
Hi @pathos
Does this help?
x <- data.frame(
var1 = 1:3,
var2 = 2:4,
var3 = 3:5
)
y1 <- 2
y2 <- 7
x$desired_output <- (x$var1 * y1) + x$var2 + (x$var3 * y2)
x
var1 var2 var3 desired_output
1 1 2 3 25
2 2 3 4 35
3 3 4 5 45
Ah thanks, but there could be infinite number of variables, so this manual hard coding of each variables wouldn't work.
Oh okay, well, it was not really clear at first why the values in the y
data were named var1
and var3
. This is because they are used to multiply the values of in the columns of similar names in x
.
I tried to expand the data a little bit. Also, I am not sure you are familiar with matrix algebra, but there's a lot of it being done here actually. So, here is my solution:
x <- data.frame(
var1 = 1:3,
var2 = 2:4,
var3 = 3:5,
var4 = 4:6,
var5 = 5:7
)
y <- data.frame(
var1 = 2,
var3 = 7,
var5 = 3
)
compute_desired_output <- function(x, y){
common_cols <- intersect(colnames(x), colnames(y))
uncommon_cols <- setdiff(colnames(x), colnames(y))
m1 <- as.matrix(x[, common_cols]) %*% t(as.matrix(y[, common_cols]))
m2 <- apply(t(t(x[, uncommon_cols])), 1, sum)
as.vector(m1) + m2
}
compute_desired_output(x = x, y = y)
[1] 44 58 72
I'll suggest to use rowSums
here.
Something like this will work as well I think.
compute_desired_output <- function(x, y){
uncommon_cols <- setdiff(colnames(x), colnames(y))
y[uncommon_cols] <- 1
y <- y[colnames(x)]
as.matrix(x) %*% t(y)
}
Thanks, clever solutions!
I tried it, but the last part is throwing me this error requires numeric/complex matrix/vector arguments
So after a bit of searching online, I set both of them as.matrix
like so: as.matrix(x) %*% t(as.matrix(y))
and still throwing the same error.
Any idea why this might be? I'm not sure why they're not recognised as matrices.
Hi!
To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
I think you're missing var2 in y to be 1. I think the solution is much simpler and this is matrix multiplication.
x <- matrix(c(1,2,3,2,3,4,3,4,5), nrow=3, byrow=TRUE)
y <- matrix(c(2, 1, 7), nrow=1)
x %*% t(y)
#> [,1]
#> [1,] 25
#> [2,] 35
#> [3,] 45
Created on 2021-08-06 by the reprex package (v2.0.0)
It seems like the code breaks down when there is a date variable, as shown in reprex
below. I'm now guessing that unmatched variables in x
has to be set aside then stitched together after with either left
or full_join
or cbind
. Or is there a more elegant solution?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
x = data.frame(
var1 = 1:3,
var2 = 2:4,
var3 = 3:5,
var4 = 4:6,
var5 = 5:7,
yearr = sample(2015:2021, 3, replace = TRUE),
monthh = sample(1:12, 3, replace = TRUE),
dayy = sample(1:28, 3, replace = TRUE)) |>
mutate(datee = ymd(paste(yearr, monthh, dayy))) |>
select(-yearr, -monthh, -dayy)
y = data.frame(
var1 = 2,
var3 = 7,
var5 = 3
)
compute_desired_output = function(x, y){
uncommon_cols = setdiff(colnames(x), colnames(y))
y[uncommon_cols] = 1
y = y[colnames(x)]
as.matrix(x) %*% t(y)
}
compute_desired_output(x = x, y = y)
#> Error in as.matrix(x) %*% t(y): requires numeric/complex matrix/vector arguments
Created on 2021-08-09 by the reprex package (v2.0.0)
This is algebra, I am not sure why you would put a date in the data. You cannot do algebra with dates. Or is there anything I am not understanding clearly?
Ah just a smaller part of a bigger problem
Well, now my line of thinking is that since the dates/row order has to be preserved, *_join
wouldn't work, so I guess I will have to use cbind
.
Solution:
compute_desired_output(x = x |> select(-datee), y = y)
df_output = x |>
cbind(compute_desired_output)
Yes, this should really be the solution. It's readable, simple and transparent linear algebra, everyone knows what's going on at first glance. The rest of solutions are non-intuitive and add complexity.
Bravo StatSteph
Second that! This is by far the most intuitive and concise solution!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.