[Arrow] str_detect and arrow_match_substring_regex automatically pulling into R

Hi Posit Users

I made a post about this to the Arrow Devs:

But I was wondering if anyone here had seen this before and had any advice.

I'm attempting to either filter or if_else an arrow table to set a new column to be either 0 or 1 based on the presence of a specific string, but the commands I would normally use to do so are insisting on pulling the entire table into R before performing the filtering (which I would rather it not do).

Which is odd considering str_detect is a compatable function in arrow

Instead, it behaves like so:

LargeTable |> filter(arrow_match_substring_regex(Column_of_interest,{pattern = "keyword"}))
Warning: Expression arrow_match_substring_regex(Column_of_interest, {... not supported in Arrow; pulling data into R

LargeTable |> filter(str_detect("keyword",Column_of_interest))
Warning: Expression str_detect("keyword", Column_of_interest) not supported in Arrow; pulling data into R

LargeTable |> mutate(Row_count = if_else((str_detect("keyword", Column_of_interest)),1,0,missing=0))
Warning: Expression if_else((str_detect("keyword", Column_of_interest)), 1, 0, missing = 0) not supported in Arrow; pulling data into R

LargeTable |> mutate(Row_count = if_else(arrow_match_substring_regex(Column_of_interest,{pattern = "keyword"}),1,0,missing=0))
Warning: Expression if_else(arrow_match_substring_regex(Column_of_interest, {... not supported in Arrow; pulling data into R

Is there something wrong with my syntax here?

Thanks in advance!

Seems to be me providing the arguments in the wrong order.

They should be:

LargeTable |> filter(str_detect(Column_of_interest,"keyword"))

LargeTable |> mutate(Row_count= if_else((str_detect(Column_of_interest, "keyword")),1,0,missing=0))
Warning: Expression if_else((str_detect("keyword", Column_of_interest)), 1, 0, missing = 0) not supported in Arrow; pulling data into R

For the R syntax at least.

Unsure about the arrow call.

Apologies!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.