On constructing an empty column in a dataframe and checking, whether it is still empty

I would like to declare a column "InformationYetToEnter" in a data frame that concatenates variables I use under some conditions. I would like to start with it empty and do something based on whether it is still empty. Suppose I have the code down below. I concatenate values of variables, and if their resulting concatenation does not appear often enough, the row is erased.
I would like to declare the empty column as below with NA, however, the concatenation then returns something like NA 1 Good, not just 1 Good. How do I fix this?
How can I then check, whether the column is still empty during the algorithm, given that

Fruits<-Fruits %>% 
  mutate(InformationYetToEnter = fct_lump_min(InformationYetToEnter, 2, other_level = "Too Rare")) %>%
  filter (InformationYetToEnter != "Too Rare")

returns an error if it is and, assuming I cannot use NA, is.na.data.frame() doesn't work anymore? NA's in the actual data should not be erased (as in checking for NA every iteration and deleting it before concatenation could run into a problem, when the first variable actually contains NA, which should not be deleted).

library(tidyverse)

Fruit<-c("Banana", "Apple", "Banana", "Apple", "Apple")
Origin<-c("New Guinea", "China","Germany", "USA", "Germany")
Quality<-c("Good", "Bad", "Good", "Very bad", "Decent")
Value<-c(50,75,80,60,30) #cents
Price<-c(1,2,1,3,1)     #euros
InformationYetToEnter<-c(NA,NA,NA,NA,NA)

Fruits<-data.frame(Fruit, Origin, Quality, Value, Price, InformationYetToEnter)
Fruits$InformationYetToEnter<-paste(Fruits$InformationYetToEnter,Fruits$Price,Fruits$Quality)

Fruits<-Fruits %>% 
  mutate(InformationYetToEnter = fct_lump_min(InformationYetToEnter, 2, other_level = "Too Rare")) %>%
  filter (InformationYetToEnter != "Too Rare")

You may assign to InformationYetToEnter an empty char vector.

InformationYetToEnter<-c('','','','','')

Thank you very much, it does what I want it to. :slightly_smiling_face:
Unfortunately, I appear to have simplified my example a bit too much, and your solution unveils another problem. In my actual example, I use ";" to seperate the concatinated variales. So I should edit my example to:

library(tidyverse)

Fruit<-c("Banana", "Apple", "Banana", "Apple", "Apple")
Origin<-c("New Guinea", "China","Germany", "USA", "Germany")
Quality<-c("Good", "Bad", "Good", "Very bad", "Decent")
Value<-c(50,75,80,60,30) #cents
Price<-c(1,2,1,3,1)     #euros
InformationYetToEnter<-c('','','','','')

Fruits<-data.frame(Fruit, Origin, Quality, Value, Price, InformationYetToEnter)
Fruits$InformationYetToEnter<-paste(Fruits$InformationYetToEnter,Fruits$Price, Fruits$Quality, sep="; ")

Fruits<-Fruits %>% 
  mutate(InformationYetToEnter = fct_lump_min(InformationYetToEnter, 2, other_level = "Too Rare")) %>%
  filter (InformationYetToEnter != "Too Rare")

Now I have an entry like ";1;Good". Is there a way to get rid of the first semicolon? Telling R to generally omit the first semicolon would suffice.

Hi again,

you may try to remove argument sep=";" from the function paste .

library(tidyverse)


Fruit<-c("Banana", "Apple", "Banana", "Apple", "Apple")
Origin<-c("New Guinea", "China","Germany", "USA", "Germany")
Quality<-c("Good", "Bad", "Good", "Very bad", "Decent")
Value<-c(50,75,80,60,30) #cents
Price<-c(1,2,1,3,1)     #euros
InformationYetToEnter<-c('','','','','')

Fruits<-data.frame(Fruit, Origin, Quality, Value, Price, InformationYetToEnter)
Fruits$InformationYetToEnter<-paste(Fruits$InformationYetToEnter,Fruits$Price, Fruits$Quality)

  Fruits %>% 
  mutate(InformationYetToEnter = fct_lump_min(InformationYetToEnter, 2, other_level = "Too Rare")) %>%
  filter (InformationYetToEnter != "Too Rare")


  # Fruit     Origin Quality Value Price InformationYetToEnter
  # 1 Banana New Guinea    Good    50     1                1 Good
  # 2 Banana    Germany    Good    80     1                1 Good
Fruits$InformationYetToEnter<-paste(Fruits$InformationYetToEnter,Fruits$Price, Fruits$Quality, sep="; ")
Fruits$InformationYetToEnter <- ifelse(test = substr(Fruits$InformationYetToEnter,
                                              1,1)==";",
                                       yes = substr(Fruits$InformationYetToEnter,
                                              2,nchar(Fruits$InformationYetToEnter)),
                                       no = Fruits$InformationYetToEnter)

Fruits<-Fruits %>% 
  mutate(InformationYetToEnter = fct_lump_min(InformationYetToEnter, 2, other_level = "Too Rare")) %>%
  filter (InformationYetToEnter != "Too Rare")

Thank you, that seems tedious to figure out! :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.