Show data based on column values match

abhi_jain80 · May 1, 2022, 12:00pm

Hi folks,

I am a newbie to R and stuck in brining the data in analyzable format. I have the data in the format as below:

Item	Item #	Date	Type	Description
A	100	22/01/2022	511	Desc1
B	200	15/01/2022	511	Desc2
A	100	20/03/2022	711	Desc1
B	200	16/02/2022	711	Desc2
A	300	10/02/2022	511	Desc3

Required output format:

Item	Item #	Out Date	In Date	Description
A	100	22/01/2022	20/03/2022	Desc1
B	200	15/01/2022	16/02/2022	Desc2

These are the outbound and inbound transactions. Type 511 are outbound and Type 711 are inbound. I need to show the data as against "Item" and "Item #" match. Hope I am able to explain the requirement.

Thanks in advance...

yifanliu · May 1, 2022, 1:39pm

It is a typical application scenario for 'pivot()' functions from tidyr.

here I replaced the Type code with some specified words at first, then used the pivot_longer() function to expand the dataframe according to the Type key:

library(tidyverse)

df1 <- tibble::tribble(
  ~Item,~`Item#`,	~Date,	~Type,	~Description,
  'A',	100,	'22/01/2022',	511,	'Desc1',
  'B',	200,	'15/01/2022',	511,	'Desc2',
  'A',	100,	'20/03/2022',	711,	'Desc1',
  'B',	200,	'16/02/2022',	711,	'Desc2',
  'A',	300,	'10/02/2022',	511,	'Desc3',
)

df1 %>% mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA') # this row point to Type code for other situations
) %>% pivot_wider(
  names_from = 'Type',
  values_from = 'Date'
)
# A tibble: 3 x 5
  Item  `Item#` Description outbound   inbound   
  <chr>   <dbl> <chr>       <chr>      <chr>     
1 A         100 Desc1       22/01/2022 20/03/2022
2 B         200 Desc2       15/01/2022 16/02/2022
3 A         300 Desc3       10/02/2022 NA

Sanjmeh · May 1, 2022, 1:48pm

Let me give a crisper solution using data.table & tidyverse without using specialised functions like pivot_longer.
Load the data:

library(data.table)
library(tidyverse)
dt <- data.table::data.table(
         Item = c("A", "B", "A", "B", "A"),
         ItemN = c(100L, 200L, 100L, 200L, 300L),
         Date = c("22/01/2022","15/01/2022","20/03/2022",
                  "16/02/2022","10/02/2022"),
         Type = c(511L, 511L, 711L, 711L, 511L),
  Description = c("Desc1", "Desc2", "Desc1", "Desc2", "Desc3")
)

And now the simple piped code giving the output:

dt[dt,on = .(Item,ItemN)][Type == 511 & Type !=i.Type] %>% 
setnames(c("Date","i.Date"), c("outDate","inDate")) %>% 
dplyr::select(-ends_with("Type"),-starts_with("Desc"))

The solution provided by @yifanliu is also correct. I just posted another one for fun

Cheers

abhi_jain80 · May 3, 2022, 11:45am

Many Thanks for providing the solution. Both your solutions are working fine for the provided dataset. Now I have got more columns in the dataset with millions of rows and I need to bring-in quantity as well in the output.
My bad, I am not been able to use your code to get the required output Can you please help?

Material	Type	Plant	Quantity	Item	ItemN	Date	Region	Description	User
MatA	511	ME11	10	A	100	22/01/2022	EMEA	Desc1	WL
MatB	511	ME11	5	B	200	15/01/2022	EMEA	Desc2	WI
MatA	711	ME21	5	A	100	20/03/2022	EMEA	Desc1	WL
MatB	711	ME21	5	B	200	16/02/2022	EMEA	Desc2	Wl
MatC	511	ME11	3	A	300	10/02/2022	EMEA	Desc3	WL
MatA	711	ME21	3	A	100	20/03/2022	EMEA	Desc1	WL
MatA	711	ME21	1	A	100	20/03/2022	EMEA	Desc1	WL

Required Output:

Item	ItemN	Description	Out_Date	In_Date	Out_Qty	In_Qty
A	100	Desc1	22/01/2022	20/03/2022	10	5
A	100	Desc1	22/01/2022	20/03/2022	10	5
A	100	Desc1	22/01/2022	20/03/2022	10	3
A	100	Desc1	22/01/2022	20/03/2022	10	1
B	200	Desc2	15/01/2022	16/02/2022	5	5

Thanks in advance...

nirgrahamuk · May 3, 2022, 12:02pm

extending yifanliu's solution:

library(tidyverse)

tibble::tribble(
  ~Item, ~`Item#`,        ~Date, ~Type, ~Description, ~quantity,
  "A",      100, "22/01/2022",   511,      "Desc1",        1L,
  "B",      200, "15/01/2022",   511,      "Desc2",        2L,
  "A",      100, "20/03/2022",   711,      "Desc1",        3L,
  "B",      200, "16/02/2022",   711,      "Desc2",        4L,
  "A",      300, "10/02/2022",   511,      "Desc3",        5L
)
df1 %>% mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA') # this row point to Type code for other situations
) %>% pivot_wider(
  names_from = 'Type',
  values_from = c('Date','quantity')
)

abhi_jain80 · May 3, 2022, 1:15pm

Thanks @nirgrahamuk for your response. I am getting an error as:

Error in app$vspace(new_style$margin-top %||% 0) :
attempt to apply non-function

library(tidyverse)

df1 <- as_tibble(read.csv("sample.csv"))

df1 %>% mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA') # this row point to Type code for other situations
) %>% pivot_wider(
  names_from = 'Type',
  values_from = c('Date','quantity')
)

The "sample.csv" file contains the data in csv format as the image attached. Not getting on what I am missing.

nirgrahamuk · May 3, 2022, 1:31pm

Your sample csv doesnt contain quality as my example did. You have Quantity.
I cant comment on the error you shared as it doesnt seem to relate to any of the code under discussion, or at least, you havent provided me a way to trigger that error.

yifanliu · May 3, 2022, 2:02pm

nirgrahamuk's solution is enough for use in my opinion, unless you only want a clean output as you required, some additional selection of cols and unnest of the output is needed.

df1 <- read_csv('sample.csv')

df1 %>% 
  select(-c(Material,Plant,Region,User)) %>% 
  mutate(Type = case_when(
  Type == 511 ~ 'outbound',
  Type == 711 ~ 'inbound',
  TRUE ~ 'NA')
) %>% pivot_wider(
  values_fn = list,
  names_from = 'Type',
  values_from = c('Date','Quantity')
) %>% unnest(matches("inbound|outbound"))

# A tibble: 5 x 7
  Item  ItemN Description Date_outbound Date_inbound Quantity_outbound Quantity_inbound
  <chr> <dbl> <chr>       <chr>         <chr>                    <dbl>            <dbl>
1 A       100 Desc1       22/01/2022    20/03/2022                  10                5
2 A       100 Desc1       22/01/2022    20/03/2022                  10                3
3 A       100 Desc1       22/01/2022    20/03/2022                  10                1
4 B       200 Desc2       15/01/2022    16/02/2022                   5                5
5 A       300 Desc3       10/02/2022    NA                           3               NA

In addition, because the sample data is getting more complex, I strongly recommend you figure out key data fields that are required to construct the result. For example, I see a column named Plant, constructing the inbound/outbound record for each plant may be more meaningful.

And I don't think that the error relates to code from our provided solution, too.

abhi_jain80 · May 3, 2022, 4:03pm

Thanks @yifanliu for the solution. This is what i need.
I got the error resolved. I was using the old package which I have updated now...

system · May 10, 2022, 4:04pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.