Hello, I am learning sparklyr so thanks for your patience and help.
I have read in some data using spark_read_csv() and I used dplyr's mutate as well as sdf_mutate to create some new variables for example:
match_cat2 <- match_cat %>% mutate(Var_C = as.character(VarC)) %>% mutate(Var_B_Avg = (VarB1 + VarB2)/2 ) %>% sdf_mutate( Var_C = ft_string_indexer(VarC), Var_D = ft_string_indexer(VarD) ) sdf_register(match_cat2, "match_cat2")
Now I'm trying to create some more variables for example:
match_cat3 <- match_cat2 %>% group_by(VarE, VarF) %>% mutate(Var_G = if(any(Var_C ==1)) ((VarG - VarG[Var_C == 1])/(Var_G + Var_G[Var_C == 1])/2) else NA)
However, I am getting an error that the column Var_G cannot be found in match_cat2:
Error in eval_bare(call, env) : object 'Var_G' not found
Its confusing me since I can see within the column Var_G within the spark table match_cat2 within the "Connections" tab.