My motivation:
I have a data frame which is being regularly updated with new records and where specific columns are variable, sometimes are present within the data update and sometimes not.
Because I want to save the data frame in database I came to conclusion, that the best way to deal with this inconsistency is to create a JSON-type column for those columns in data frame whose occurrence is variable and save in database only this JSON column.
My idea is that newly created JSON column will contain data from all selected columns but only for a single row/record - to avoid redundancy.
Now I am facing a problem how to convert selected columns in DF to new JSON column and only for data in that particular row/record.
My dummy DF:
df <- tribble(
~col_1, ~col_2, ~special_col_1, ~special_col_2, ~special_col_3,
100, 200, 300, 400, 500,
5000, 6000, 7000, 8000, 9000,
0, 1, 2, 3, 4
)
so the DF looks like this:
A tibble: 3 × 5
col_1 col_2 special_col_1 special_col_2 special_col_3
<dbl> <dbl> <dbl> <dbl> <dbl>
1 100 200 300 400 500
2 5000 6000 7000 8000 9000
3 0 1 2 3 4
I would like to create a new column, lets call it for example "gathered_json", which would contain data only from columns with prefix "special_col_"; so my final resulting DF would contain only columns "col_1", "col_2" and "gathered_json".
Also, the "gathered_json" would contain only data from a single row; so for example for a first row df$gathered_json[1] the record would somethin like this:
[
{
"special_col_1": 300,
"special_col_2": 400,
"special_col_3": 500
}
]
The idea is, that when I need to process the data, I will upload the data frame from database and convert the "gathered_json" column to standard data frame (tibble) columns with which I can work in standard way.
I will appreciate any hints or advices how to face this problem; if possible I would welcome some tidyverse-way solution. Many thanks in advance.