Notes:
- It would be good if you could share a usable data for questions like this (e.g. via using
dput
). I've included this in my answer.
- I'm including code that tries to simulate your particular use-case by using the
randomNames
package to help generate a data frame with 30000 ID
s and random names
- Though you haven't explicitly stated this, I deduce from the code provided that you'd prefer a base-r solution. That would definitely not be my preference, but I've tried to hack together a base r solution that which, while perhaps not being particularly elegant, should at least contain all of the pieces that you'd need to tailor your own solution.
- I also provide a tidyverse solution which would be my preference (and in case others may find it useful)
-
NB: Neither of these approaches are particularly performant as the data grows (I haven't benchmarked either against your loop, but I'd be surprised if they were markedly faster). Having pretty much zero familiarity with the sodium package, I think this is mainly because you generate a unique nonce for every row of your data, effectively forcing sodium's vectorised
data_encrypt
function to operate like a non-vecorised function. It is not immediately clear to me why one would want to do this.
Data:
# create data
df <- structure(
list(
ID = c(1, 2, 3, 4, 5),
Name = c("John", "Peter", "Mark", "Joe", "Tim"),
Attr = c("A1", "A1", "A2", "A2", "A3")
),
row.names = c(NA, -5L),
class = c("tbl_df", "tbl", "data.frame")
)
# create "large", "simulated" data frame to replicate user's use-case
df_large <- data.frame(
ID = 1:30000,
Name = randomNames::randomNames(30000, which.names = 'first')
)
# create encryption key (that works with sodium functions)
key <- sha256(charToRaw("somespecialkey"))
Base R approach:
# function to encrypt vector on inputs with a given key and return a data frame
# with a column for the resultant encrypted input and nonce values
encrypt_vector <- function(input, key) {
list_output <- lapply(input, function(x) {
# 1. Generate a singular nonce for each row of data.
rnonce <- random(24)
# 2. Encrypt name column of df
serializedName <- serialize(x, NULL)
cipher <- data_encrypt(serializedName, key, rnonce)
# 3. Store the encrypted name, class and nonce
list(name = bin2hex(cipher), nonce = bin2hex(rnonce))
})
data.frame(do.call(rbind, list_output))
}
# apply to small data
x <- encrypt_vector(df$Name, key)
df$Name <- x$name
df$nonce <- x$nonce
# apply to large data (slow)
x <- encrypt_vector(df_large$Name, key)
df_large$Name <- x$name
df_large$nonce <- x$nonce
Tidyverse approach:
# function to encrypt vector on inputs with a given key and return a data frame
# with a column for the resultant encrypted input and nonce values
encrypt_vector <- function(input, key) {
list_output <- map(input, function(x) {
# 1. Generate a singular nonce for each row of data.
rnonce <- random(24)
# 2. Encrypt name column of df
serializedName <- serialize(x, NULL)
cipher <- data_encrypt(serializedName, key, rnonce)
# 3. Store the encrypted name, class and nonce
tibble(name = bin2hex(cipher), nonce = bin2hex(rnonce))
})
}
# apply to small data
df <- df %>%
mutate(encryption = encrypt_vector(Name, key)) %>%
select(-Name) %>%
unnest() %>%
rename(Name = name)
# apply to large data
df_large <- df_large %>%
mutate(encryption = encrypt_vector(Name, key)) %>%
select(-Name) %>%
unnest() %>%
rename(Name = name)