Help on - Loop - argument is of length zero

Hi ,

I am trying to run a if loop inside a for loop . The data has a date column which is formatted. When i am trying to execute the code its giving me error. "argument is of length zero". I checked for missing or NA in data but there is none as i am working with sample data for observation max 50.
I have formatted the date column and it looks fine.

Can anyone help,

Thanks

It would help everyone if you provide a reprex - FAQ: What's a reproducible example (`reprex`) and how do I do one?
This will allow us actually solve the problem you have.

Hi Team,

Please find below the code

df1$Date_Created <- as.Date(df1$Date_Created,"%m/%d/%Y")
df2$Availability.Date=as.Date(df2$Availability.Date, "%m/%d/%Y")
df1$lessmonth1 <- as.Date(df1$Date_Created, "%m/%d/%Y") %m-% months(6)
df1$greatmonth1 <- as.Date(df1$Date_Created, "%m/%d/%Y") %m+% months(6)





for (i in 1:nrow(df1)){
  count <- 0
  for (j in 1:nrow(df2)){
    if((df1[i, "Skill"] == df2[j,"Skill"])&
       (df1[i, "job"] == df2[j, "jb"] | df1[i, "job"] +1 ==  df2[j, "jb"])&
       ((df1[i, "conditions"]) == "L")&
       (df1[i, "Work_Location"] == df2[j, "Base.country"])&
       (df2[j, "Availability.date"] >= df1[i, "lessmonth1"] & df2[j, "Availability.date"] <= df1[i, "greatmonth1"])
             
    )
      count <- count + 1
  }
  print(count)
  df1[i, "out"] <- count

}

## Its failing only at the last filter - Availability date
Error in if ((df1[i, ""] == df2[j, "Primary.Skill.Pool."]) &  : 
  argument is of length zero

I can't know for sure because you are not providing a valid reproducible example (with sample data) but I think your problem is that you cant use [i,""] for subsetting, see this example.

iris[1,""]
#> NULL
iris[1,]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa

Created on 2019-03-13 by the reprex package (v0.2.1)

2 Likes

Andres, I agree with what you're saying, but I'm not getting what line could be creating this specific error.

I can't find if ((df1[i, ""] == df2[j, "Primary.Skill.Pool."]) & ... in the code that OP posted.

Hi @srini,

It is a little hard to tell what you're going for without a reprex, but using a for-loop over rows in a data frame is usually not a good use pattern in R. I think the dplyr code below reproduces your intent, is easier to read, and should run substantially faster because of the vectorization of R functions.

df1 %>%
  mutate(row_id = row_number()) %>%
  group_by(row_id) %>%
  mutate(out = sum(Skill == df2$Skill & 
                   (job == df2$jb | job + 1 == df2$jb) &
                   conditions == "L" &
                   Work_Location == df2$Base.country &
                   Availability.date > lessmonth1 &
                   greatmonth1 >= Availability.date)) %>%
  ungroup()
2 Likes

I'll start by agreeing with @alexkgold: vectorized is the best approach. Go with that.


While the error message doesn't match the provided code, I think @andresrcs is on the right track. The error message argument is of length zero usually means one of two things:

  1. A vector with no elements was provided
  2. A NULL was provided

In this code, both can happen. The loops go through the vectors 1:nrow(df1) and 1:nrow(df2). If either data.frame has no rows, the looped vector would be c(1, 0). And subsetting a data.frame with just 0 for the row and a single column returns a vector of length 0:

iris[0, "Sepal.Length"]
# numeric(0)

You can avoid this by replacing 1:nrow(df1) with seq_len(nrow(df1)). The latter will return an empty vector if df1 has no rows, so the loop's code won't be run.


A NULL can show up if any of the columns named in the code deosn't actually exist in the respective data.frame.

iris[1, "not a column"]
# NULL

Make sure you haven't misspelled a column name and the data is guaranteed to have those names.

Hi Alex,

Thank you for the reply and the code. I understand your point. One issue is that both the datasets doesn't have common primary key. They are totally different. will there be any difference in the code which you have provided.?

Regards,
Srinivas

You can use the merge function to do a cross-product join:

df1 <- data.frame(x = 1:3)
df2 <- data.frame(y = c("a", "b", "c"))
merge(df1, df2, by = NULL)
#   x y
# 1 1 a
# 2 2 a
# 3 3 a
# 4 1 b
# 5 2 b
# 6 3 b
# 7 1 c
# 8 2 c
# 9 3 c

Note the argument by = NULL. This prevents merge() from being "helpful" in deciding shared column names are keys for merging.

Hi @alexkgold

Thank you for the reply and the code. I understand your point. One issue is that both the datasets doesn't have common primary key. They are totally different. will there be any difference in the code which you have provided.?

Regards,
Srinivas

Hi @srini,

I don't believe so.

The code counts the number of times the values match, so I don't think primary keys would be relevant.

That said, it's hard to be confident without a full reprex.

Hi @nwerth,

Thank you so much for the clarification. In my situation the datasets (df1) has to look for the filter conditions in df2 and see if the condition is satisfied if yes then it has to do the count (where it counts the number of times the particular condition is satisfied) . will the merge still work?

Regards,
Srinivas

hi @alexkgold

Thank you. Sure I am the dummy data will provide. Meanwhile i tried to run the code you shared it's showing "+" am i missing something here ?

df1 %>%
  + mutate(row_id = row_number()) %>%
  + group_by(row_id) %>%
  + mutate(out = sum(Skill == df2$Skill & 
               +   ((job == df2$jb | job + 1 == df2$jb) &
                 +  conditions == "L" &
                   +Work_Location == df2$Base.country &
                   +Availability.date > lessmonth1 &
                   +greatmonth1 >= Availability.date) %>%
  + ungroup()
+
+

Hi @alexkgold

The dummy date is for df1

skill conditions Job cat work_location Date created
Art L 2 IND 1/30/2016
science E 3 NZ 2/27/2017
maths L 4 CHI 3/20/2018
maths L 5 SWT 4/22/2017
sciencce L 6 IND 5/26/2018

for df2

Job cat Base Country Availability date skill
2 IND 1/30/2016 Art
3 NZ 7/22/2017 science
4 NZ 10/30/2018 maths
5 SWT 12/26/2017 maths
3 IND 6/25/2016 sciencce
2 IND 2/21/2016 maths
3 IND 12/21/2015 maths
3 IND 10/21/2015 sciencce
5 SWT 1/22/2017 maths
5 SWT 7/22/2017 maths
5 SWT 11/22/2016 sciencce

This is not on a copy/paste friendly format, could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

1 Like

Hi @alexkgold,

Hi @andresrcs

Please find below the reproducible code. Please let me know if i missed anything,

library(lubridate)
df1 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 6L),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce")),
                  conditions = as.factor(c("L", "E", "L", "L", "L")),
                  work_location = as.factor(c("IND", "NZ", "CHI", "SWT", "IND")),
                  Date.created = as.factor(c("1/30/2016", "2/27/2017", "3/20/2018",
                                             "4/22/2017", "5/26/2018")))

df2 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 3L, 2L, 3L, 3L, 5L, 5L, 5L),
                  Base.Country = as.factor(c("IND", "NZ", "NZ", "SWT", "IND", "IND", "IND",
                                             "IND", "SWT", "SWT", "SWT")),
                  Date.Available = as.factor(c("1/30/2016", "7/22/2017", "10/30/2018",
                                               "12/26/2017", "6/25/2016", "2/21/2016",
                                               "12/21/2015", "10/21/2015", "1/22/2017", "7/22/2017",
                                               "11/22/2016")),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce",
                                      "maths", "maths", "sciencce", "maths", "maths",
                                      "sciencce")))
df1$Date.created <- mdy(df1$Date.created)
df2$Date.Available <- mdy(df2$Date.Available)

# To view the lesser and greater 6 month range
df1$lessmonth <- as.Date(df1$Date.created) %m-% months(6)
df1$greatmonth <- as.Date(df1$Date.created) %m+% months(6)

for (i in 1:nrow(df1)){
  count <- 0
  for (j in 1:nrow(df2)){
    if((df1[i, "skill"] == df2[j,"skill"])&
       (df1[i, "job.cat"] == df2[j, "job.cat"] | df1[i, "job.cat"] +1 ==  df2[j, "job.cat"])&
       ((df1[i, "conditions"]) == "L")&
       (df1[i, "work_location"] == df2[j, "Base.Country"])&
       else if ((df1[i, "conditions"]) == "E""] then [(df1[i, "work_location"] == df2[j, "work_location])&
       (df2[j, "Date.Available"] >= df1[i, "lessmonth1"] & df2[j, "Date.Available"] <= df1[i, "greatmonth1"])
    )
      count <- count + 1
  }
  print(count)
  df1[i, "out"] <- count
  
}

#error Error during wrapup: unexpected 'else' in:
"       (df1[i, "work_location"] == df2[j, "Base.Country"])&
       else"

Hi @nwerth

Thank you for the explanation.

Regards,
Srinivas

Hi Team,

Any help on this will be highly appreciated. Thanks for your time,

Regards,
Srini

I'll suggest you not to expect that others will help you as soon as you post a question. Also, you'll have to remember about the time difference in different parts of the world.

Having said that, you've confused Job.cat with job.cat in your code. Also, you added 1 for both lessmonth and greatmonth inside the loop.

Also, this part is clearly wrong:

There's no if and as far as I know, there's no then in R. This condition was absent in your original code. Are you sure it'll be here? You can't have condition both E and L.

And, please place your codes inside a pair of ```.

A working code
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df1 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 6L),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce")),
                  conditions = as.factor(c("L", "E", "L", "L", "L")),
                  work_location = as.factor(c("IND", "NZ", "CHI", "SWT", "IND")),
                  Date.created = as.factor(c("1/30/2016", "2/27/2017", "3/20/2018", "4/22/2017", "5/26/2018")))
df2 <- data.frame(Job.cat = c(2L, 3L, 4L, 5L, 3L, 2L, 3L, 3L, 5L, 5L, 5L),
                  Base.Country = as.factor(c("IND", "NZ", "NZ", "SWT", "IND", "IND", "IND", "IND", "SWT", "SWT", "SWT")),
                  Date.Available = as.factor(c("1/30/2016", "7/22/2017", "10/30/2018", "12/26/2017", "6/25/2016", "2/21/2016", "12/21/2015", "10/21/2015", "1/22/2017", "7/22/2017", "11/22/2016")),
                  skill = as.factor(c("Art", "science", "maths", "maths", "sciencce", "maths", "maths", "sciencce", "maths", "maths", "sciencce")))

df1$Date.created <- mdy(df1$Date.created)
df2$Date.Available <- mdy(df2$Date.Available)

# To view the lesser and greater 6 month range
df1$lessmonth <- as.Date(df1$Date.created) %m-% months(6)
df1$greatmonth <- as.Date(df1$Date.created) %m+% months(6)

for (i in 1:nrow(df1)){
  count <- 0
  for (j in 1:nrow(df2)){
    if((df1[i, "skill"] == df2[j, "skill"])&
       (df1[i, "Job.cat"] == df2[j, "Job.cat"] | df1[i, "Job.cat"] +1 == df2[j, "Job.cat"])&
       ((df1[i, "conditions"]) == "L")&
       (df1[i, "work_location"] == df2[j, "Base.Country"])&
       #((df1[i, "conditions"]) == "E") & (df1[i, "work_location"] == df2[j, "work_location"])&
       (df2[j, "Date.Available"] >= df1[i, "lessmonth"] & df2[j, "Date.Available"] <= df1[i, "greatmonth"]))
    count <- count + 1
  }
  print(count)
  df1[i, "out"] <- count
}
#> [1] 1
#> [1] 0
#> [1] 0
#> [1] 2
#> [1] 0
2 Likes

Hi @Yarnabrina,

Thank you so much for the working code. I completely understand that all are working in different time zone and it's not possible to reply asap.

Yes i understand your point that if and then can't be in used . But i have a filter in which if conditions is L then the work location is same as base country. (For e.g if conditions is "L" and work location is India then only the country with India under Base country is valid. but if the conditions is "E" and the work location is India then any countries under base country can be valid. That's the reason i was using both.

is there any alternative to this?

Thanks and Regards,
Srinivas