Put specific space between names

M_AcostaCH · November 7, 2022, 6:01am

Hi community

I want to put spaces in each name according to the numbers that it has in its name.
Each name is encoded to have 5 spaces for 5 numbers and then a letter, for example G#####A.

I have a database that does not respect these spaces, all the numbers come together (column names).

I try to count the number of letters and in names2 I put some spaces but it doesn't work very well for all rows. And other problem is beacause count the final letter.

The idea is to get the names as they appear in names_TRUE, where the 5 spaces for the numbers are respected in each name.

df <- data.frame(names=c('G1', 'G5', 'G22','G52', 'G768A','G522','G3412',
                         'G3412B' , 'G51323C'))

df$NoCh <- nchar(df$names)

df <- df %>%
  mutate(names2 = case_when(
      NoCh == 2 ~ gsub("([G])", "\\1    ", df$names), 
      NoCh == 3 ~ gsub("([G])", "\\1   ", df$names), 
      NoCh == 4 ~ gsub("([G])", "\\1  ", df$names), 
      NoCh == 5 ~ gsub("([G])", "\\1 ", df$names),
      NoCh == 6 ~ gsub("([G])", "\\1", df$names),
      NoCh == 7 ~ gsub("([G])", "\\1", df$names)))

df$names_TRUE <- c('G    1', 'G    5', 'G   22','G   52', 'G  768A','G  522','G 3412',
  'G 3412B' , 'G51323C')

df
# names      NoCh names2   names_TRUE
# 1      G1    2  G    1     G    1
# 2      G5    2  G    5     G    5
# 3     G22    3  G   22     G   22
# 4     G52    3  G   52     G   52
# 5   G768A    5  G 768A    G  768A
# 6    G522    4  G  522     G  522
# 7   G3412    5  G 3412     G 3412
# 8  G3412B    6  G3412B    G 3412B
# 9 G51323C    7 G51323C    G51323C

Tnks

DavoWW · November 7, 2022, 7:17am

Hi @M_AcostaCH,
Can you use just the string length to get the required result?

library(stringr)

df <- data.frame(names=c('G1','G5','G22','G52','G768A','G522','G3412','G3412B','G51323C'))

df$NoCh <- nchar(df$names)
df$NoSpaces_needed <- 7-df$NoCh

df$string_head <- str_sub(df$names, start=1L, end=1L)
df$string_tail <- str_sub(df$names, start=2L, end=df$NoCh)

df$names_true <- paste(df$string_head, 
                       strrep(" ", df$NoSpaces_needed),
                       df$string_tail, sep="")
df
#>     names NoCh NoSpaces_needed string_head string_tail names_true
#> 1      G1    2               5           G           1    G     1
#> 2      G5    2               5           G           5    G     5
#> 3     G22    3               4           G          22    G    22
#> 4     G52    3               4           G          52    G    52
#> 5   G768A    5               2           G        768A    G  768A
#> 6    G522    4               3           G         522    G   522
#> 7   G3412    5               2           G        3412    G  3412
#> 8  G3412B    6               1           G       3412B    G 3412B
#> 9 G51323C    7               0           G      51323C    G51323C

^{Created on 2022-11-07 with reprex v2.0.2}

M_AcostaCH · November 7, 2022, 10:44pm

Hi @DavoWW is very close tnks.

But for example in row 1, only need 4 spaces, because de number occuped the 5 space. For second rows is the same. (G 1) (G 5)

For third rows only need 3 spaces, not 4, because the 2 number occuped 4 and 5 space.
(G 22) (G 52).

df$NoSpaces_needed <- 6-df$NoCh  # Im change this for 6. Appea negative numbers, maybe this is the problem.

# Error in strrep(" ", df$NoSpaces_neded) : invalid 'times' value

I could explane in a new form: exist 5 spaces for numerics values, 6 space is only letter.

G numeric-numeric-numeric-numeric-numeric-letter

For example for G 1, not contains letter.

Gspace-space-space-space-1

For G 768A
Gspace-space-768A

For G 3412B
Gspace-3412B

#In put the correct form of spaces.
names_TRUE
1     G    1
2     G    5
3     G   22
4     G   52
5     G  768A
6     G  522
7     G 3412
8     G 3412B
9     G51323C

FJCC · November 7, 2022, 11:27pm

A slight variation on @DavoWW's code.

library(stringr)

df <- data.frame(names=c('G1','G5','G22','G52','G768A','G522','G3412','G3412B','G51323C'))

df$NoDig <- nchar(str_replace_all(df$names,"\\D",""))
df$NoCh <- nchar(df$names)
df$NoSpaces_needed <- 5-df$NoDig

df$string_head <- str_sub(df$names, start=1L, end=1L)
df$string_tail <- str_sub(df$names, start=2L, end=df$NoCh)

df$names_true <- paste(df$string_head, 
                       strrep(" ", df$NoSpaces_needed),
                       df$string_tail, sep="")
df
#>     names NoDig NoCh NoSpaces_needed string_head string_tail names_true
#> 1      G1     1    2               4           G           1     G    1
#> 2      G5     1    2               4           G           5     G    5
#> 3     G22     2    3               3           G          22     G   22
#> 4     G52     2    3               3           G          52     G   52
#> 5   G768A     3    5               2           G        768A    G  768A
#> 6    G522     3    4               2           G         522     G  522
#> 7   G3412     4    5               1           G        3412     G 3412
#> 8  G3412B     4    6               1           G       3412B    G 3412B
#> 9 G51323C     5    7               0           G      51323C    G51323C

^{Created on 2022-11-07 with reprex v2.0.2}

M_AcostaCH · November 8, 2022, 3:11am

Yes, is a some variation of @DavoWW solutions, tnks!

You change a little because you find the digits and character in different columns.

This spaces are very important because I need make a search this names in SQL and this is very sensible for this spaces.

system · November 15, 2022, 3:11am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.