How would I execute this algebra operation?

I'm not sure if there is a specific name for it, but this is what I would like to do (where y is just one row of data):

image

Hi @pathos

Does this help?

x <- data.frame(
  var1 = 1:3,
  var2 = 2:4,
  var3 = 3:5
)

y1 <- 2
y2 <- 7

x$desired_output <- (x$var1 * y1) + x$var2 + (x$var3 * y2)

x

  var1 var2 var3 desired_output
1    1    2    3             25
2    2    3    4             35
3    3    4    5             45

Ah thanks, but there could be infinite number of variables, so this manual hard coding of each variables wouldn't work.

Oh okay, well, it was not really clear at first why the values in the y data were named var1 and var3. This is because they are used to multiply the values of in the columns of similar names in x.

I tried to expand the data a little bit. Also, I am not sure you are familiar with matrix algebra, but there's a lot of it being done here actually. So, here is my solution:

x <- data.frame(
  var1 = 1:3,
  var2 = 2:4,
  var3 = 3:5,
  var4 = 4:6,
  var5 = 5:7
)

y <- data.frame(
  var1 = 2,
  var3 = 7,
  var5 = 3
)

compute_desired_output <- function(x, y){
  common_cols <- intersect(colnames(x), colnames(y))
  uncommon_cols <- setdiff(colnames(x), colnames(y))
  
  m1 <- as.matrix(x[, common_cols]) %*% t(as.matrix(y[, common_cols]))
  m2 <- apply(t(t(x[, uncommon_cols])), 1, sum)
  
  as.vector(m1) + m2
}

compute_desired_output(x = x, y = y)

[1] 44 58 72
1 Like

I'll suggest to use rowSums here.

Something like this will work as well I think.

compute_desired_output <- function(x, y){
    uncommon_cols <- setdiff(colnames(x), colnames(y))

    y[uncommon_cols] <- 1
    y <- y[colnames(x)]
    
    as.matrix(x) %*% t(y)
}
2 Likes

Thanks, clever solutions!
I tried it, but the last part is throwing me this error requires numeric/complex matrix/vector arguments

So after a bit of searching online, I set both of them as.matrix like so: as.matrix(x) %*% t(as.matrix(y)) and still throwing the same error.

Any idea why this might be? I'm not sure why they're not recognised as matrices.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

I think you're missing var2 in y to be 1. I think the solution is much simpler and this is matrix multiplication.

x <- matrix(c(1,2,3,2,3,4,3,4,5), nrow=3, byrow=TRUE)
y <- matrix(c(2, 1, 7), nrow=1)

x %*% t(y)
#>      [,1]
#> [1,]   25
#> [2,]   35
#> [3,]   45

Created on 2021-08-06 by the reprex package (v2.0.0)

5 Likes

It seems like the code breaks down when there is a date variable, as shown in reprex below. I'm now guessing that unmatched variables in x has to be set aside then stitched together after with either left or full_join or cbind. Or is there a more elegant solution?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

x = data.frame(
  var1 = 1:3,
  var2 = 2:4,
  var3 = 3:5,
  var4 = 4:6,
  var5 = 5:7,
  yearr = sample(2015:2021, 3, replace = TRUE),
  monthh = sample(1:12, 3, replace = TRUE),
  dayy = sample(1:28, 3, replace = TRUE)) |>
    mutate(datee = ymd(paste(yearr, monthh, dayy))) |>
    select(-yearr, -monthh, -dayy)

y = data.frame(
  var1 = 2,
  var3 = 7,
  var5 = 3
)

compute_desired_output = function(x, y){
  uncommon_cols = setdiff(colnames(x), colnames(y))
  
  y[uncommon_cols] = 1
  y = y[colnames(x)]
  
  as.matrix(x) %*% t(y)
}

compute_desired_output(x = x, y = y)
#> Error in as.matrix(x) %*% t(y): requires numeric/complex matrix/vector arguments

Created on 2021-08-09 by the reprex package (v2.0.0)

This is algebra, I am not sure why you would put a date in the data. You cannot do algebra with dates. Or is there anything I am not understanding clearly?

Ah just a smaller part of a bigger problem

Well, now my line of thinking is that since the dates/row order has to be preserved, *_join wouldn't work, so I guess I will have to use cbind.

Solution:

compute_desired_output(x = x |> select(-datee), y = y)
df_output = x |>
  cbind(compute_desired_output)

Yes, this should really be the solution. It's readable, simple and transparent linear algebra, everyone knows what's going on at first glance. The rest of solutions are non-intuitive and add complexity.

Bravo StatSteph :slight_smile:

Second that! This is by far the most intuitive and concise solution!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.