I have a list of hostnames that i would like to convert to a more friendly names in R. Is this possible to do please?
Host name
95b4ae6d890e4c46986d91d7ac4bf08200000W
95b4ae6d890e4c46986d91d7ac4bf08200000W
95b4ae6d890e4c46986d91d7ac4bf08200000V
95b4ae6d890e4c46986d91d7ac4bf08200000V
95b4ae6d890e4c46986d91d7ac4bf08200000Z
95b4ae6d890e4c46986d91d7ac4bf08200000Z
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf082000011
95b4ae6d890e4c46986d91d7ac4bf08200000H
95b4ae6d890e4c46986d91d7ac4bf08200000H
jdlong
January 10, 2019, 7:05pm
2
you could do this all sorts of ways. What did you have in mind?
You could map each of these to a number. Or you could map each to the name of a former President of the US. Or you could make each of them a noble gas.
1 Like
I was hoping for host1,host2,host3, and so on. Just to make it more readable.
taras
January 10, 2019, 7:13pm
4
How is this stored? A list, a vector, a column of a table?
In a nutshell, my idea would be to generate a vector of friendly names, and then cbind
it to the table, or pass it into a list.
E.g.
paste0("host", seq(1:10))
gives you this:
[1] "host1" "host2" "host3" "host4" "host5" "host6" "host7" "host8" "host9" "host10"
Only instead of 10
you'll need to pass something like nrow
or length
depending on your initial object.
jdlong
January 10, 2019, 7:15pm
5
of maybe something like this:
I start with a data frame named df
containing one column, names
:
df
#> names
#> 1 wyezsnmpct
#> 2 loifrapnuq
#> 3 mcotjfeglb
#> 4 zdaelstqor
#> 5 soxtzagqkr
#> 6 rjocznhtqu
#> 7 zspjlkfwat
#> 8 zmqtpdyxcw
#> 9 ldryxkighq
#> 10 eylhsudnom
Then using the dplyr
package I calculate a new column based on the row number:
library(dplyr)
df %>%
mutate(nice_name = paste0("host_", row_number()))
#> names nice_name
#> 1 wyezsnmpct host_1
#> 2 loifrapnuq host_2
#> 3 mcotjfeglb host_3
#> 4 zdaelstqor host_4
#> 5 soxtzagqkr host_5
#> 6 rjocznhtqu host_6
#> 7 zspjlkfwat host_7
#> 8 zmqtpdyxcw host_8
#> 9 ldryxkighq host_9
#> 10 eylhsudnom host_10
Created on 2019-01-10 by the reprex package (v0.2.1)
1 Like
It's stored in a data frame as column.
1 Like
taras
January 10, 2019, 7:18pm
7
Something like:
library(tidyverse)
df <- tibble(host_name = c(
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf08200000H",
"95b4ae6d890e4c46986d91d7ac4bf08200000H"))
df <- cbind(df, name = paste("host", seq(1:nrow(df))))
Gives you this:
host_name name
1 95b4ae6d890e4c46986d91d7ac4bf08200000W host1
2 95b4ae6d890e4c46986d91d7ac4bf08200000W host2
3 95b4ae6d890e4c46986d91d7ac4bf08200000V host3
4 95b4ae6d890e4c46986d91d7ac4bf08200000V host4
5 95b4ae6d890e4c46986d91d7ac4bf08200000Z host5
6 95b4ae6d890e4c46986d91d7ac4bf08200000Z host6
7 95b4ae6d890e4c46986d91d7ac4bf082000011 host7
8 95b4ae6d890e4c46986d91d7ac4bf082000011 host8
9 95b4ae6d890e4c46986d91d7ac4bf082000011 host9
10 95b4ae6d890e4c46986d91d7ac4bf082000011 host10
11 95b4ae6d890e4c46986d91d7ac4bf08200000H host11
12 95b4ae6d890e4c46986d91d7ac4bf08200000H host12
taras
January 10, 2019, 7:21pm
8
Yes! I wanted this, but couldn't remember the function for getting the index / row number. Apparently, it is row_number()
. Who would have thought.
hoelk
January 10, 2019, 7:21pm
9
The solutions posted here do not account for the fact that some of your hosts are the same..
When i need to enumerate items, I use this trick:
x <- c(
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf08200000H",
"95b4ae6d890e4c46986d91d7ac4bf08200000H"
)
paste0("host", xtfrm(x))
which gives you
[1] "host3" "host3" "host2" "host2" "host4" "host4" "host5" "host5" "host5" "host5" "host1" "host1"
edit: originally hat the hacky as.integer(as.factor(x)) till i remembered xtfrm()
1 Like
The only issue here is that the same hostname may appear more than once.
taras
January 10, 2019, 7:23pm
11
How? It depends on row numbers, which are sequential and unique (think index)
Never mind me, I'm an idiot. I see it now.
jdlong
January 10, 2019, 7:26pm
12
ohhh.. well @hoelk is spot on with his solution. We could also do this with a more tidyverse solution using the power of group_by
:
library(tidyverse)
df <- tibble(host_name = c(
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf08200000H",
"95b4ae6d890e4c46986d91d7ac4bf08200000H"))
df %>%
group_by(host_name) %>%
summarize() %>%
mutate(nice_name = paste0("host_", row_number()))
#> # A tibble: 5 x 2
#> host_name nice_name
#> <chr> <chr>
#> 1 95b4ae6d890e4c46986d91d7ac4bf08200000H host_1
#> 2 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2
#> 3 95b4ae6d890e4c46986d91d7ac4bf08200000W host_3
#> 4 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_4
#> 5 95b4ae6d890e4c46986d91d7ac4bf082000011 host_5
Created on 2019-01-10 by the reprex package (v0.2.1)
taras
January 10, 2019, 7:32pm
13
Yes. Or, instead of group_by()
, do df %>% select(host_name) %>% distinct()
to get a dim "lookup" table of distinct names (that's what I thought this table column was!), and engineer friendly names there.
1 Like
Thanks for this! i don't need them to be grouped by host_name. if i remove group_by some hostname get more tha one name.
taras
January 10, 2019, 7:39pm
16
Well, you kind of do, whether it is group_by()
or distinct()
, you'd need to make a list of distinct host names. You'd obviously handle it separately in a different table. Think dimensional table in a relational database...
My 2 cents, FWIW. I may be wrong.
jdlong
January 10, 2019, 7:40pm
17
user124578:
some
I'm just using group_by
for the side effect that it makes things unique. Taras recommended distinct
(great choice) or even unique
which is another option.
library(tidyverse)
df <- tibble(host_name = c(
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf08200000H",
"95b4ae6d890e4c46986d91d7ac4bf08200000H"))
df %>%
unique() %>%
mutate(nice_name = paste0("host_", row_number()))
#> # A tibble: 5 x 2
#> host_name nice_name
#> <chr> <chr>
#> 1 95b4ae6d890e4c46986d91d7ac4bf08200000W host_1
#> 2 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2
#> 3 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_3
#> 4 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4
#> 5 95b4ae6d890e4c46986d91d7ac4bf08200000H host_5
Created on 2019-01-10 by the reprex package (v0.2.1)
taras
January 10, 2019, 7:42pm
18
jdlong:
Taras recommended unique
Fake news, I recommended distinct()
! (I guess they give same results though, so pick your poison)
There are many paths to one... solution
jdlong
January 10, 2019, 7:44pm
19
did not.. YOU'RE fake news!
Ok, so I changed it while you were responding
1 Like
Thanks again! This doesn't give me what I am after. I need to keep the same number of host names. The above example still summaries the host names. I want to see the host name appear more than once. Thanks
jdlong
January 10, 2019, 7:48pm
22
oh... well just join it back to your original data:
library(tidyverse)
df <- tibble(host_name = c(
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000W",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000V",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf08200000Z",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf082000011",
"95b4ae6d890e4c46986d91d7ac4bf08200000H",
"95b4ae6d890e4c46986d91d7ac4bf08200000H"))
df %>%
unique() %>%
mutate(nice_name = paste0("host_", row_number())) %>%
left_join(df)
#> Joining, by = "host_name"
#> # A tibble: 12 x 2
#> host_name nice_name
#> <chr> <chr>
#> 1 95b4ae6d890e4c46986d91d7ac4bf08200000W host_1
#> 2 95b4ae6d890e4c46986d91d7ac4bf08200000W host_1
#> 3 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2
#> 4 95b4ae6d890e4c46986d91d7ac4bf08200000V host_2
#> 5 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_3
#> 6 95b4ae6d890e4c46986d91d7ac4bf08200000Z host_3
#> 7 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4
#> 8 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4
#> 9 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4
#> 10 95b4ae6d890e4c46986d91d7ac4bf082000011 host_4
#> 11 95b4ae6d890e4c46986d91d7ac4bf08200000H host_5
#> 12 95b4ae6d890e4c46986d91d7ac4bf08200000H host_5
Created on 2019-01-10 by the reprex package (v0.2.1)
1 Like