I am using dbplyr/dplyr to query a redshift database.
I have a query that looks like this:
tbl(con, "table") %>%
mutate(random_number = sql("RANDOM()"))
or this
tbl(con, "table") %>%
mutate(random_number = random())
How can I set a seed so that the results are returned predictably? I see in the redshift documentation I would write:
set seed to .25;
select cast (random() * 100 as int);
int4
Is there a straightforward way to incorporate this into my dplyr chain?
thanks!
I must not understand well. This snippet shows one way to make it predictable, but there's no advantage over just using a constant.
rando <- function()
{set.seed(42)
return(sample(100:500,1))
}
rando()
#> [1] 148
rando()
#> [1] 148
(head(mtcars[1:5,1:5]) |> dplyr::mutate(random_number = rando()))
#> mpg cyl disp hp drat random_number
#> Mazda RX4 21.0 6 160 110 3.90 148
#> Mazda RX4 Wag 21.0 6 160 110 3.90 148
#> Datsun 710 22.8 4 108 93 3.85 148
#> Hornet 4 Drive 21.4 6 258 110 3.08 148
#> Hornet Sportabout 18.7 8 360 175 3.15 148
Created on 2023-05-30 with reprex v2.0.2
This only works if you are working with data locally - not in a db backend.
redshift
must work differently from other backend setups like MySQL
, then, where a con
streams to the local process.
system
Closed
5
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.