R converting into more friendly names

I have a list of hostnames that i would like to convert to a more friendly names in R. Is this possible to do please?

Host name
95b4ae6d890e4c46986d91d7ac4bf08200000W
95b4ae6d890e4c46986d91d7ac4bf08200000W
95b4ae6d890e4c46986d91d7ac4bf08200000V
95b4ae6d890e4c46986d91d7ac4bf08200000V
95b4ae6d890e4c46986d91d7ac4bf08200000Z
95b4ae6d890e4c46986d91d7ac4bf08200000Z
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf08200000H
95b4ae6d890e4c46986d91d7ac4bf08200000H

you could do this all sorts of ways. What did you have in mind?

You could map each of these to a number. Or you could map each to the name of a former President of the US. Or you could make each of them a noble gas.

1 Like

I was hoping for host1,host2,host3, and so on. Just to make it more readable.

How is this stored? A list, a vector, a column of a table?
In a nutshell, my idea would be to generate a vector of friendly names, and then cbind it to the table, or pass it into a list.

E.g.

paste0("host", seq(1:10))

gives you this:

[1] "host1"  "host2"  "host3"  "host4"  "host5"  "host6"  "host7"  "host8"  "host9"  "host10"

Only instead of 10 you'll need to pass something like nrow or length depending on your initial object.

of maybe something like this:

I start with a data frame named df containing one column, names:

df
#>         names
#> 1  wyezsnmpct
#> 2  loifrapnuq
#> 3  mcotjfeglb
#> 4  zdaelstqor
#> 5  soxtzagqkr
#> 6  rjocznhtqu
#> 7  zspjlkfwat
#> 8  zmqtpdyxcw
#> 9  ldryxkighq
#> 10 eylhsudnom

Then using the dplyr package I calculate a new column based on the row number:

library(dplyr)

df %>%
  mutate(nice_name = paste0("host_", row_number()))
#>         names nice_name
#> 1  wyezsnmpct    host_1
#> 2  loifrapnuq    host_2
#> 3  mcotjfeglb    host_3
#> 4  zdaelstqor    host_4
#> 5  soxtzagqkr    host_5
#> 6  rjocznhtqu    host_6
#> 7  zspjlkfwat    host_7
#> 8  zmqtpdyxcw    host_8
#> 9  ldryxkighq    host_9
#> 10 eylhsudnom   host_10

Created on 2019-01-10 by the reprex package (v0.2.1)

1 Like

It's stored in a data frame as column.

1 Like

Something like:

library(tidyverse)
df <- tibble(host_name = c(
             "95b4ae6d890e4c46986d91d7ac4bf08200000W",
             "95b4ae6d890e4c46986d91d7ac4bf08200000W",
             "95b4ae6d890e4c46986d91d7ac4bf08200000V",
             "95b4ae6d890e4c46986d91d7ac4bf08200000V",
             "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
             "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
             "95b4ae6d890e4c46986d91d7ac4bf082000011",
             "95b4ae6d890e4c46986d91d7ac4bf082000011",
             "95b4ae6d890e4c46986d91d7ac4bf082000011",
             "95b4ae6d890e4c46986d91d7ac4bf082000011",
             "95b4ae6d890e4c46986d91d7ac4bf08200000H",
             "95b4ae6d890e4c46986d91d7ac4bf08200000H"))

df <- cbind(df, name = paste("host", seq(1:nrow(df))))

Gives you this:

                                host_name   name
1  95b4ae6d890e4c46986d91d7ac4bf08200000W  host1
2  95b4ae6d890e4c46986d91d7ac4bf08200000W  host2
3  95b4ae6d890e4c46986d91d7ac4bf08200000V  host3
4  95b4ae6d890e4c46986d91d7ac4bf08200000V  host4
5  95b4ae6d890e4c46986d91d7ac4bf08200000Z  host5
6  95b4ae6d890e4c46986d91d7ac4bf08200000Z  host6
7  95b4ae6d890e4c46986d91d7ac4bf082000011  host7
8  95b4ae6d890e4c46986d91d7ac4bf082000011  host8
9  95b4ae6d890e4c46986d91d7ac4bf082000011  host9
10 95b4ae6d890e4c46986d91d7ac4bf082000011 host10
11 95b4ae6d890e4c46986d91d7ac4bf08200000H host11
12 95b4ae6d890e4c46986d91d7ac4bf08200000H host12

Yes! I wanted this, but couldn't remember the function for getting the index / row number. Apparently, it is row_number(). Who would have thought.

The solutions posted here do not account for the fact that some of your hosts are the same..
When i need to enumerate items, I use this trick:

x <- c(
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H"
)

paste0("host", xtfrm(x))

which gives you

 [1] "host3" "host3" "host2" "host2" "host4" "host4" "host5" "host5" "host5" "host5" "host1" "host1"

edit: originally hat the hacky as.integer(as.factor(x)) till i remembered xtfrm()

1 Like

The only issue here is that the same hostname may appear more than once.

How? It depends on row numbers, which are sequential and unique (think index)

Never mind me, I'm an idiot. I see it now.

ohhh.. well @hoelk is spot on with his solution. We could also do this with a more tidyverse solution using the power of group_by:


library(tidyverse)
df <- tibble(host_name = c(
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H"))

df %>%
  group_by(host_name) %>%
  summarize() %>%
  mutate(nice_name = paste0("host_", row_number()))
#> # A tibble: 5 x 2
#>   host_name                              nice_name
#>   <chr>                                  <chr>    
#> 1 95b4ae6d890e4c46986d91d7ac4bf08200000H host_1   
#> 2 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2   
#> 3 95b4ae6d890e4c46986d91d7ac4bf08200000W host_3   
#> 4 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_4   
#> 5 95b4ae6d890e4c46986d91d7ac4bf082000011 host_5

Created on 2019-01-10 by the reprex package (v0.2.1)

Yes. Or, instead of group_by(), do df %>% select(host_name) %>% distinct() to get a dim "lookup" table of distinct names (that's what I thought this table column was!), and engineer friendly names there.

1 Like

Thanks for this! i don't need them to be grouped by host_name. if i remove group_by some hostname get more tha one name.

Well, you kind of do, whether it is group_by() or distinct(), you'd need to make a list of distinct host names. You'd obviously handle it separately in a different table. Think dimensional table in a relational database...

My 2 cents, FWIW. I may be wrong.

I'm just using group_by for the side effect that it makes things unique. Taras recommended distinct (great choice) or even unique which is another option.



library(tidyverse)
df <- tibble(host_name = c(
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H"))

df %>%
  unique() %>%
  mutate(nice_name = paste0("host_", row_number()))
#> # A tibble: 5 x 2
#>   host_name                              nice_name
#>   <chr>                                  <chr>    
#> 1 95b4ae6d890e4c46986d91d7ac4bf08200000W host_1   
#> 2 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2   
#> 3 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_3   
#> 4 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4   
#> 5 95b4ae6d890e4c46986d91d7ac4bf08200000H host_5

Created on 2019-01-10 by the reprex package (v0.2.1)

Fake news, I recommended distinct()! :smiley: (I guess they give same results though, so pick your poison)
There are many paths to one... solution :wink:

did not.. YOU'RE fake news!

Ok, so I changed it while you were responding :slight_smile:

1 Like

Thanks again! This doesn't give me what I am after. I need to keep the same number of host names. The above example still summaries the host names. I want to see the host name appear more than once. Thanks

oh... well just join it back to your original data:

library(tidyverse)
df <- tibble(host_name = c(
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000W",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000V",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf08200000Z",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf082000011",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H",
  "95b4ae6d890e4c46986d91d7ac4bf08200000H"))

df %>%
  unique() %>%
  mutate(nice_name = paste0("host_", row_number())) %>%
  left_join(df)
#> Joining, by = "host_name"
#> # A tibble: 12 x 2
#>    host_name                              nice_name
#>    <chr>                                  <chr>    
#>  1 95b4ae6d890e4c46986d91d7ac4bf08200000W host_1   
#>  2 95b4ae6d890e4c46986d91d7ac4bf08200000W host_1   
#>  3 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2   
#>  4 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2   
#>  5 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_3   
#>  6 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_3   
#>  7 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4   
#>  8 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4   
#>  9 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4   
#> 10 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4   
#> 11 95b4ae6d890e4c46986d91d7ac4bf08200000H host_5   
#> 12 95b4ae6d890e4c46986d91d7ac4bf08200000H host_5

Created on 2019-01-10 by the reprex package (v0.2.1)

1 Like