I am a newbie to R and stuck in brining the data in analyzable format. I have the data in the format as below:
Item
Item #
Date
Type
Description
A
100
22/01/2022
511
Desc1
B
200
15/01/2022
511
Desc2
A
100
20/03/2022
711
Desc1
B
200
16/02/2022
711
Desc2
A
300
10/02/2022
511
Desc3
Required output format:
Item
Item #
Out Date
In Date
Description
A
100
22/01/2022
20/03/2022
Desc1
B
200
15/01/2022
16/02/2022
Desc2
These are the outbound and inbound transactions. Type 511 are outbound and Type 711 are inbound. I need to show the data as against "Item" and "Item #" match. Hope I am able to explain the requirement.
It is a typical application scenario for 'pivot()' functions from tidyr.
here I replaced the Type code with some specified words at first, then used the pivot_longer() function to expand the dataframe according to the Type key:
library(tidyverse)
df1 <- tibble::tribble(
~Item,~`Item#`, ~Date, ~Type, ~Description,
'A', 100, '22/01/2022', 511, 'Desc1',
'B', 200, '15/01/2022', 511, 'Desc2',
'A', 100, '20/03/2022', 711, 'Desc1',
'B', 200, '16/02/2022', 711, 'Desc2',
'A', 300, '10/02/2022', 511, 'Desc3',
)
df1 %>% mutate(Type = case_when(
Type == 511 ~ 'outbound',
Type == 711 ~ 'inbound',
TRUE ~ 'NA') # this row point to Type code for other situations
) %>% pivot_wider(
names_from = 'Type',
values_from = 'Date'
)
# A tibble: 3 x 5
Item `Item#` Description outbound inbound
<chr> <dbl> <chr> <chr> <chr>
1 A 100 Desc1 22/01/2022 20/03/2022
2 B 200 Desc2 15/01/2022 16/02/2022
3 A 300 Desc3 10/02/2022 NA
Many Thanks for providing the solution. Both your solutions are working fine for the provided dataset. Now I have got more columns in the dataset with millions of rows and I need to bring-in quantity as well in the output.
My bad, I am not been able to use your code to get the required output Can you please help?
Your sample csv doesnt contain quality as my example did. You have Quantity.
I cant comment on the error you shared as it doesnt seem to relate to any of the code under discussion, or at least, you havent provided me a way to trigger that error.
nirgrahamuk's solution is enough for use in my opinion, unless you only want a clean output as you required, some additional selection of cols and unnest of the output is needed.
df1 <- read_csv('sample.csv')
df1 %>%
select(-c(Material,Plant,Region,User)) %>%
mutate(Type = case_when(
Type == 511 ~ 'outbound',
Type == 711 ~ 'inbound',
TRUE ~ 'NA')
) %>% pivot_wider(
values_fn = list,
names_from = 'Type',
values_from = c('Date','Quantity')
) %>% unnest(matches("inbound|outbound"))
# A tibble: 5 x 7
Item ItemN Description Date_outbound Date_inbound Quantity_outbound Quantity_inbound
<chr> <dbl> <chr> <chr> <chr> <dbl> <dbl>
1 A 100 Desc1 22/01/2022 20/03/2022 10 5
2 A 100 Desc1 22/01/2022 20/03/2022 10 3
3 A 100 Desc1 22/01/2022 20/03/2022 10 1
4 B 200 Desc2 15/01/2022 16/02/2022 5 5
5 A 300 Desc3 10/02/2022 NA 3 NA
In addition, because the sample data is getting more complex, I strongly recommend you figure out key data fields that are required to construct the result. For example, I see a column named Plant, constructing the inbound/outbound record for each plant may be more meaningful.
And I don't think that the error relates to code from our provided solution, too.