Actually I am trying a semi_join, I was thinking about it from a filter perspective but I'll give the semi_join() a try. Thanks so much for your help! Do you suggest I use collect() at the end of the code block with semi_join()?
If you want to get the dataframe back to your R session you can use collect(), but since it needs to fit in memory in R the data frame has to be small enough.
With spark, I understand that every pipe flow you make create lazy operation. Everything could be run when using collect. I wonder if the issue would not come from somewhere before when you create df2_n2_s...
The connection looks like it is working alright, I'm thinking its not liking the filtering by a matching column within a different df. When I used semi_join() it appears to work but I'm still figuring out how to look at the structure of the dataframe in the connection haha. I'm still getting used to the difference between working in a spark context vs with R objects in the global environment.