Placing Commas Between Names

I am trying to find out if certain patterns appear within a data frame.

Suppose I have the following "dictionary of patterns" (notice "james" vs "jamesj"):

patterns <- c("john", "jack", "james", "jamesj", "jason")

The actual data frame ("date_frame") I have looks like this:

  id                                              names
1  1                                     johnjack jameS
2  2                             john/james, jasonjames
3  3                                    peter_jackjason
4  4                                   jamesjasonj jack
5  5 jamesjjason, johnjasonjohn , jason-jack sam _ peter

The final result I am trying to produce should look like this:

  id                                                         names
1  1                                             john, jack, james
2  2                                     john, james, jason, james
3  3                                            peter, jack, jason
4  4                                          jamesj, asonj,  jack
5  5 jamesj, jason, john, jason, john , jason, jack,  sam ,  peter

I tried looking at this post here (R: insert comma after each element from the output) and tried the answer provided there:

> data_frame$parsed_names = dput(data_frame$names)



  id                                                         names                                                  parsed_names
1  1                                             john, jack, james                                             john, jack, james
2  2                                     john, james, jason, james                                     john, james, jason, james
3  3                                            peter, jack, jason                                            peter, jack, jason
4  4                                          jamesj, asonj,  jack                                          jamesj, asonj,  jack
5  5 jamesj, jason, john, jason, john , jason, jack,  sam ,  peter jamesj, jason, john, jason, john , jason, jack,  sam ,  peter

But this is not corresponding to what I wanted.

I then tried this post over here (insert commas in text string after certain words in r) and tried the answer provided there:

library(gsubfn)

data_frame$parsed_names = gsubfn("\\w+", as.list(setNames(paste0(patterns, ","), patterns)), 
  format(data_frame$names))

 data_frame
  id                                                         names                                                         parsed_names
1  1                                             john, jack, james     john,, jack,, james,                                            
2  2                                     john, james, jason, james    john,, james,, jason,, james,                                    
3  3                                            peter, jack, jason      peter, jack,, jason,                                           
4  4                                          jamesj, asonj,  jack      jamesj,, asonj,  jack,                                         
5  5 jamesj, jason, john, jason, john , jason, jack,  sam ,  peter jamesj,, jason,, john,, jason,, john, , jason,, jack,,  sam ,  peter
  • Can someone please show me how to fix this?

Thank you!

I don't quite understand the intended relationship between the patterns and the names.
I initially assumed that patterns are to be picked out from the names, and the results would reduce to what was picked out, but I realise that this can't be the case, as peter isnt in any pattern, but you pick it out (seperate and retain it) in your expected output. Putting aside code and the input/output example , and speaking human to human, what are we trying to do here ?

aside : if you simply want to know whether each of your 'patterns' appears within a dataframe I would expect a much simpler string detect set up, where you true false if a pattern is somewhere in a given names.

1 Like

Thank you for your reply! I want to place a comma between every "name" in each row of the data frame. I have a "patterns" data frame that contains some of the names (e.g. peter is not in this data frame). I would like to see if it is possible to add a comma in each row of the data frame between names within the "patterns" data frame.

What about names not in patterns ? They can be discarded ?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.