How would I execute this algebra operation?

pathos · August 6, 2021, 9:22am

I'm not sure if there is a specific name for it, but this is what I would like to do (where y is just one row of data):

gueyenono · August 6, 2021, 9:33am

Does this help?

x <- data.frame(
  var1 = 1:3,
  var2 = 2:4,
  var3 = 3:5
)

y1 <- 2
y2 <- 7

x$desired_output <- (x$var1 * y1) + x$var2 + (x$var3 * y2)

x

  var1 var2 var3 desired_output
1    1    2    3             25
2    2    3    4             35
3    3    4    5             45

pathos · August 6, 2021, 10:02am

Ah thanks, but there could be infinite number of variables, so this manual hard coding of each variables wouldn't work.

gueyenono · August 6, 2021, 10:24am

Oh okay, well, it was not really clear at first why the values in the y data were named var1 and var3. This is because they are used to multiply the values of in the columns of similar names in x.

I tried to expand the data a little bit. Also, I am not sure you are familiar with matrix algebra, but there's a lot of it being done here actually. So, here is my solution:

x <- data.frame(
  var1 = 1:3,
  var2 = 2:4,
  var3 = 3:5,
  var4 = 4:6,
  var5 = 5:7
)

y <- data.frame(
  var1 = 2,
  var3 = 7,
  var5 = 3
)

compute_desired_output <- function(x, y){
  common_cols <- intersect(colnames(x), colnames(y))
  uncommon_cols <- setdiff(colnames(x), colnames(y))
  
  m1 <- as.matrix(x[, common_cols]) %*% t(as.matrix(y[, common_cols]))
  m2 <- apply(t(t(x[, uncommon_cols])), 1, sum)
  
  as.vector(m1) + m2
}

compute_desired_output(x = x, y = y)

[1] 44 58 72

Yarnabrina · August 6, 2021, 10:49am

I'll suggest to use rowSums here.

Something like this will work as well I think.

compute_desired_output <- function(x, y){
    uncommon_cols <- setdiff(colnames(x), colnames(y))

    y[uncommon_cols] <- 1
    y <- y[colnames(x)]
    
    as.matrix(x) %*% t(y)
}

pathos · August 6, 2021, 4:40pm

Thanks, clever solutions!
I tried it, but the last part is throwing me this error requires numeric/complex matrix/vector arguments

So after a bit of searching online, I set both of them as.matrix like so: as.matrix(x) %*% t(as.matrix(y)) and still throwing the same error.

Any idea why this might be? I'm not sure why they're not recognised as matrices.

nirgrahamuk · August 6, 2021, 5:10pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

StatSteph · August 6, 2021, 5:39pm

I think you're missing var2 in y to be 1. I think the solution is much simpler and this is matrix multiplication.

x <- matrix(c(1,2,3,2,3,4,3,4,5), nrow=3, byrow=TRUE)
y <- matrix(c(2, 1, 7), nrow=1)

x %*% t(y)
#>      [,1]
#> [1,]   25
#> [2,]   35
#> [3,]   45

^{Created on 2021-08-06 by the reprex package (v2.0.0)}

pathos · August 9, 2021, 5:57am

It seems like the code breaks down when there is a date variable, as shown in reprex below. I'm now guessing that unmatched variables in x has to be set aside then stitched together after with either left or full_join or cbind. Or is there a more elegant solution?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

x = data.frame(
  var1 = 1:3,
  var2 = 2:4,
  var3 = 3:5,
  var4 = 4:6,
  var5 = 5:7,
  yearr = sample(2015:2021, 3, replace = TRUE),
  monthh = sample(1:12, 3, replace = TRUE),
  dayy = sample(1:28, 3, replace = TRUE)) |>
    mutate(datee = ymd(paste(yearr, monthh, dayy))) |>
    select(-yearr, -monthh, -dayy)

y = data.frame(
  var1 = 2,
  var3 = 7,
  var5 = 3
)

compute_desired_output = function(x, y){
  uncommon_cols = setdiff(colnames(x), colnames(y))
  
  y[uncommon_cols] = 1
  y = y[colnames(x)]
  
  as.matrix(x) %*% t(y)
}

compute_desired_output(x = x, y = y)
#> Error in as.matrix(x) %*% t(y): requires numeric/complex matrix/vector arguments

^{Created on 2021-08-09 by the reprex package (v2.0.0)}

gueyenono · August 9, 2021, 6:30am

This is algebra, I am not sure why you would put a date in the data. You cannot do algebra with dates. Or is there anything I am not understanding clearly?

pathos · August 9, 2021, 7:10am

Ah just a smaller part of a bigger problem

Well, now my line of thinking is that since the dates/row order has to be preserved, *_join wouldn't work, so I guess I will have to use cbind.

Solution:

compute_desired_output(x = x |> select(-datee), y = y)
df_output = x |>
  cbind(compute_desired_output)

olibravo · August 10, 2021, 8:32pm

Yes, this should really be the solution. It's readable, simple and transparent linear algebra, everyone knows what's going on at first glance. The rest of solutions are non-intuitive and add complexity.

Bravo StatSteph

mafw · August 12, 2021, 1:32pm

Second that! This is by far the most intuitive and concise solution!

system · August 19, 2021, 1:32pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.