But I was wondering if anyone here had seen this before and had any advice.
I'm attempting to either filter or if_else an arrow table to set a new column to be either 0 or 1 based on the presence of a specific string, but the commands I would normally use to do so are insisting on pulling the entire table into R before performing the filtering (which I would rather it not do).
Which is odd considering str_detect is a compatable function in arrow
Instead, it behaves like so:
LargeTable |> filter(arrow_match_substring_regex(Column_of_interest,{pattern = "keyword"}))
Warning: Expression arrow_match_substring_regex(Column_of_interest, {... not supported in Arrow; pulling data into R
LargeTable |> filter(str_detect("keyword",Column_of_interest))
Warning: Expression str_detect("keyword", Column_of_interest) not supported in Arrow; pulling data into R
LargeTable |> mutate(Row_count = if_else((str_detect("keyword", Column_of_interest)),1,0,missing=0))
Warning: Expression if_else((str_detect("keyword", Column_of_interest)), 1, 0, missing = 0) not supported in Arrow; pulling data into R
LargeTable |> mutate(Row_count = if_else(arrow_match_substring_regex(Column_of_interest,{pattern = "keyword"}),1,0,missing=0))
Warning: Expression if_else(arrow_match_substring_regex(Column_of_interest, {... not supported in Arrow; pulling data into R
Seems to be me providing the arguments in the wrong order.
They should be:
LargeTable |> filter(str_detect(Column_of_interest,"keyword"))
LargeTable |> mutate(Row_count= if_else((str_detect(Column_of_interest, "keyword")),1,0,missing=0))
Warning: Expression if_else((str_detect("keyword", Column_of_interest)), 1, 0, missing = 0) not supported in Arrow; pulling data into R