Help on - Loop - argument is of length zero

#8

Hi Alex,

Thank you for the reply and the code. I understand your point. One issue is that both the datasets doesn't have common primary key. They are totally different. will there be any difference in the code which you have provided.?

Regards,
Srinivas

0 Likes

#9

You can use the merge function to do a cross-product join:

df1 <- data.frame(x = 1:3)
df2 <- data.frame(y = c("a", "b", "c"))
merge(df1, df2, by = NULL)
#   x y
# 1 1 a
# 2 2 a
# 3 3 a
# 4 1 b
# 5 2 b
# 6 3 b
# 7 1 c
# 8 2 c
# 9 3 c

Note the argument by = NULL. This prevents merge() from being "helpful" in deciding shared column names are keys for merging.

0 Likes

#10

Hi @alexkgold

Thank you for the reply and the code. I understand your point. One issue is that both the datasets doesn't have common primary key. They are totally different. will there be any difference in the code which you have provided.?

Regards,
Srinivas

0 Likes

#11

Hi @srini,

I don't believe so.

The code counts the number of times the values match, so I don't think primary keys would be relevant.

That said, it's hard to be confident without a full reprex.

0 Likes

#12

Hi @nwerth,

Thank you so much for the clarification. In my situation the datasets (df1) has to look for the filter conditions in df2 and see if the condition is satisfied if yes then it has to do the count (where it counts the number of times the particular condition is satisfied) . will the merge still work?

Regards,
Srinivas

0 Likes

#13

hi @alexkgold

Thank you. Sure I am the dummy data will provide. Meanwhile i tried to run the code you shared it's showing "+" am i missing something here ?

df1 %>%
  + mutate(row_id = row_number()) %>%
  + group_by(row_id) %>%
  + mutate(out = sum(Skill == df2$Skill & 
               +   ((job == df2$jb | job + 1 == df2$jb) &
                 +  conditions == "L" &
                   +Work_Location == df2$Base.country &
                   +Availability.date > lessmonth1 &
                   +greatmonth1 >= Availability.date) %>%
  + ungroup()
+
+
0 Likes

#15

Hi @alexkgold

The dummy date is for df1

skill conditions Job cat work_location Date created
Art L 2 IND 1/30/2016
science E 3 NZ 2/27/2017
maths L 4 CHI 3/20/2018
maths L 5 SWT 4/22/2017
sciencce L 6 IND 5/26/2018

for df2

Job cat Base Country Availability date skill
2 IND 1/30/2016 Art
3 NZ 7/22/2017 science
4 NZ 10/30/2018 maths
5 SWT 12/26/2017 maths
3 IND 6/25/2016 sciencce
2 IND 2/21/2016 maths
3 IND 12/21/2015 maths
3 IND 10/21/2015 sciencce
5 SWT 1/22/2017 maths
5 SWT 7/22/2017 maths
5 SWT 11/22/2016 sciencce
0 Likes

#16

This is not on a copy/paste friendly format, could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

1 Like

#17

Hi @alexkgold,

Hi @andresrcs

Please find below the reproducible code. Please let me know if i missed anything,

library(lubridate)
df1 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 6L),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce")),
                  conditions = as.factor(c("L", "E", "L", "L", "L")),
                  work_location = as.factor(c("IND", "NZ", "CHI", "SWT", "IND")),
                  Date.created = as.factor(c("1/30/2016", "2/27/2017", "3/20/2018",
                                             "4/22/2017", "5/26/2018")))

df2 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 3L, 2L, 3L, 3L, 5L, 5L, 5L),
                  Base.Country = as.factor(c("IND", "NZ", "NZ", "SWT", "IND", "IND", "IND",
                                             "IND", "SWT", "SWT", "SWT")),
                  Date.Available = as.factor(c("1/30/2016", "7/22/2017", "10/30/2018",
                                               "12/26/2017", "6/25/2016", "2/21/2016",
                                               "12/21/2015", "10/21/2015", "1/22/2017", "7/22/2017",
                                               "11/22/2016")),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce",
                                      "maths", "maths", "sciencce", "maths", "maths",
                                      "sciencce")))
df1$Date.created <- mdy(df1$Date.created)
df2$Date.Available <- mdy(df2$Date.Available)

# To view the lesser and greater 6 month range
df1$lessmonth <- as.Date(df1$Date.created) %m-% months(6)
df1$greatmonth <- as.Date(df1$Date.created) %m+% months(6)

for (i in 1:nrow(df1)){
  count <- 0
  for (j in 1:nrow(df2)){
    if((df1[i, "skill"] == df2[j,"skill"])&
       (df1[i, "job.cat"] == df2[j, "job.cat"] | df1[i, "job.cat"] +1 ==  df2[j, "job.cat"])&
       ((df1[i, "conditions"]) == "L")&
       (df1[i, "work_location"] == df2[j, "Base.Country"])&
       else if ((df1[i, "conditions"]) == "E""] then [(df1[i, "work_location"] == df2[j, "work_location])&
       (df2[j, "Date.Available"] >= df1[i, "lessmonth1"] & df2[j, "Date.Available"] <= df1[i, "greatmonth1"])
    )
      count <- count + 1
  }
  print(count)
  df1[i, "out"] <- count
  
}

#error Error during wrapup: unexpected 'else' in:
"       (df1[i, "work_location"] == df2[j, "Base.Country"])&
       else"
0 Likes

#18

Hi @nwerth

Thank you for the explanation.

Regards,
Srinivas

0 Likes

#19

Hi Team,

Any help on this will be highly appreciated. Thanks for your time,

Regards,
Srini

0 Likes

#20

I'll suggest you not to expect that others will help you as soon as you post a question. Also, you'll have to remember about the time difference in different parts of the world.

Having said that, you've confused Job.cat with job.cat in your code. Also, you added 1 for both lessmonth and greatmonth inside the loop.

Also, this part is clearly wrong:

There's no if and as far as I know, there's no then in R. This condition was absent in your original code. Are you sure it'll be here? You can't have condition both E and L.

And, please place your codes inside a pair of ```.

A working code
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df1 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 6L),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce")),
                  conditions = as.factor(c("L", "E", "L", "L", "L")),
                  work_location = as.factor(c("IND", "NZ", "CHI", "SWT", "IND")),
                  Date.created = as.factor(c("1/30/2016", "2/27/2017", "3/20/2018", "4/22/2017", "5/26/2018")))
df2 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 3L, 2L, 3L, 3L, 5L, 5L, 5L),
                  Base.Country = as.factor(c("IND", "NZ", "NZ", "SWT", "IND", "IND", "IND", "IND", "SWT", "SWT", "SWT")),
                  Date.Available = as.factor(c("1/30/2016", "7/22/2017", "10/30/2018", "12/26/2017", "6/25/2016", "2/21/2016", "12/21/2015", "10/21/2015", "1/22/2017", "7/22/2017", "11/22/2016")),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce", "maths", "maths", "sciencce", "maths", "maths", "sciencce")))

df1$Date.created <- mdy(df1$Date.created)
df2$Date.Available <- mdy(df2$Date.Available)

# To view the lesser and greater 6 month range
df1$lessmonth <- as.Date(df1$Date.created) %m-% months(6)
df1$greatmonth <- as.Date(df1$Date.created) %m+% months(6)

for (i in 1:nrow(df1)){
  count <- 0
  for (j in 1:nrow(df2)){
    if((df1[i, "skill"] == df2[j, "skill"])&
       (df1[i, "Job.cat"] == df2[j, "Job.cat"] | df1[i, "Job.cat"] +1 == df2[j, "Job.cat"])&
       ((df1[i, "conditions"]) == "L")&
       (df1[i, "work_location"] == df2[j, "Base.Country"])&
       #((df1[i, "conditions"]) == "E") & (df1[i, "work_location"] == df2[j, "work_location"])&
       (df2[j, "Date.Available"] >= df1[i, "lessmonth"] & df2[j, "Date.Available"] <= df1[i, "greatmonth"]))
    count <- count + 1
  }
  print(count)
  df1[i, "out"] <- count
}
#> [1] 1
#> [1] 0
#> [1] 0
#> [1] 2
#> [1] 0
2 Likes

#21

Hi @Yarnabrina,

Thank you so much for the working code. I completely understand that all are working in different time zone and it's not possible to reply asap.

Yes i understand your point that if and then can't be in used . But i have a filter in which if conditions is L then the work location is same as base country. (For e.g if conditions is "L" and work location is India then only the country with India under Base country is valid. but if the conditions is "E" and the work location is India then any countries under base country can be valid. That's the reason i was using both.

is there any alternative to this?

Thanks and Regards,
Srinivas

0 Likes

#22

I'm not sure I understand how exactly you're using for the aforementioned filter here.

((df1[i, "conditions"]) == "L")&
  (df1[i, "work_location"] == df2[j, "Base.Country"])&
  else if ((df1[i, "conditions"]) == "E""] then [(df1[i, "work_location"] == df2[j, "work_location])

Are you looking for something like the following? (...'s are the other filters)

... &
(((df1[i, "conditions"] == "L") & (df1[i, "work_location"] == df2[j, "Base.Country"])) | ((df1[i, "conditions"] == "E") & (df1[i, "work_location"] == "IND"))) &
...
If so, you can check this code.
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df1 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 6L),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce")),
                  conditions = as.factor(c("L", "E", "L", "L", "L")),
                  work_location = as.factor(c("IND", "NZ", "CHI", "SWT", "IND")),
                  Date.created = as.factor(c("1/30/2016", "2/27/2017", "3/20/2018", "4/22/2017", "5/26/2018")))
df2 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 3L, 2L, 3L, 3L, 5L, 5L, 5L),
                  Base.Country = as.factor(c("IND", "NZ", "NZ", "SWT", "IND", "IND", "IND", "IND", "SWT", "SWT", "SWT")),
                  Date.Available = as.factor(c("1/30/2016", "7/22/2017", "10/30/2018", "12/26/2017", "6/25/2016", "2/21/2016", "12/21/2015", "10/21/2015", "1/22/2017", "7/22/2017", "11/22/2016")),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce", "maths", "maths", "sciencce", "maths", "maths", "sciencce")))

df1$Date.created <- mdy(df1$Date.created)
df2$Date.Available <- mdy(df2$Date.Available)

# To view the lesser and greater 6 month range
df1$lessmonth <- as.Date(df1$Date.created) %m-% months(6)
df1$greatmonth <- as.Date(df1$Date.created) %m+% months(6)

for (i in 1:nrow(df1)){
  count <- 0
  for (j in 1:nrow(df2)){
    if((df1[i, "skill"] == df2[j, "skill"])&
       (df1[i, "Job.cat"] == df2[j, "Job.cat"] | df1[i, "Job.cat"] +1 == df2[j, "Job.cat"])&
       (((df1[i, "conditions"] == "L") & (df1[i, "work_location"] == df2[j, "Base.Country"])) | ((df1[i, "conditions"] == "E") & (df1[i, "work_location"] == "IND"))) &
       (df2[j, "Date.Available"] >= df1[i, "lessmonth"] & df2[j, "Date.Available"] <= df1[i, "greatmonth"]))
      count <- count + 1
  }
  print(count)
  df1[i, "out"] <- count
}
#> [1] 1
#> [1] 0
#> [1] 0
#> [1] 2
#> [1] 0
0 Likes

#23

Hi @Yarnabrina,

Thanks for the reply and the code something simliar. Please find attached the conditions. if it's L and E then the conditions differ

  1. for e.g if code has to take the first row in df1 say job.cat is 2, skill is art , conditions is L and work location is NL and go to the datasets df2 and see if this conditions satisifies if yes then the new count is added to the df1 dataset. so in this situation only NL should be selected since the condition is Local. but in the next row if the work location is India then only India should be selected under base.country.

2, and now in the next row of df1 say job.cat is 2, skill is science, condition is E and work location is NL and go to the datasets df2 and see if the conditions is satisifie now in this situation any country under base country can be part of the count there is no restrictions on the base country there.

0 Likes

#25

First of all, English isn't my native language, and hence I find it difficult to understand long descriptions. I'll guess that if((df1[i, Job.cat] == 2) & (df1[i,skill] == "Art") & (df1[i,conditions] == "L") & (df1[i,work_location] == "NL")) will represent the conditions you're trying to say in point 1, but as far as I can see, "NL" is absent as a work_location, and I really don't understand what you're saying regarding df2 immediately afterwards. So, you'll have to complete the rest.

Second, I already provided you a working solution regarding your original question, and a modification based on further conditions. Surely, you can generalise from that to add more conditions as you require, isn't it?

Please don't expect us to code for you. If you're facing problems, people here will surely try to point out the mistakes. But you'll have to solve your problems by yourself finally.

I hope you understand my point and will take it sportingly. Good luck!

1 Like

#26

Hi @Yarnabrina,

Thank you so much for all the support and the guidance. Very helpful. Really appreciate your time.
Yes i understand your point i don't expect people to code for me that's why i have pasted my code earlier to help me out with the mistakes. However i was unclear if i communicated correctly here about my problem. So i posted the longer descriptions then expected.

0 Likes

#27

Hi all,

Thanks for helping out. The R code with For loop i tested with few records working fine. however with the large data its running for hours. I completely understand using the dplyr filter however it's working and giving the error of

Error in rank(x, ties.method = "first", na.last = "keep") :
argument "x" is missing, with no default

0 Likes

#28

Hi Guys,

Thank you so much. I could resolve it finally and got what i was looking for. It was good learning exercise.

Regards,
sri

3 Likes

closed #30

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes