sql scrip done with mutate?? parent child R srip to power bi

larsjensen · January 16, 2019, 12:30pm

hej

i got this SQL scrtip that i need to run in power bi, but i cant get it to work, therefor im going to rewrite it in R to then change my table so i get the right result

the meaning of the sql scrip is to create a parent/child hierarchy.

UPDATE b SET
[Parent] = ISNULL
  (
   (
    SELECT
     MAX(a.[LedgerAccount])
    FROM
     [dbo].[DIM_LedgerAccounts] a
    WHERE
     a.[No_] < b.[No_]
     AND a.[Indentation] = b.[Indentation] - 1
     AND a.[Income_Balance] = b.[Income_Balance]
   )
   ,b.[LedgerAccount]
  )
FROM
[dbo].[DIM_LedgerAccounts] b
GO

this is what i have created in R atm but it is not working currently

 test7 <- test6 %>% 
  mutate(No_2 = No_) %>%
  mutate(Indentation2 = Indentation) %>% 
  mutate(Income_Balance2 = Income_Balance) %>% 
  mutate(parent = No_>No_2 && Indentation == Indentation2-1 && Income_Balance == Income_Balance2)

the ideer is to only have 1 tabel aswell therefor i am mutating the values in again and lates i will remove them.

i hope that someone can help me

the data will look something like this

No_	Income_Balance	Indentation
10000	1	0
11000	1	1
11100	1	2
11200	1	3
11400	1	3
11500	1	3
11600	1	3
11700	1	2
12000	1	2
12100	1	3
12200	1	3
12300	1	2
13000	1	2
13100	1	3
13200	1	3
13300	1	3
13350	1	3
13400	1	2
13500	1	2
13510	1	3
13540	1	2
14000	1	2
14100	1	3
14200	1	3
14300	1	3
14500	1	2
15950	1	1
16000	1	1
16100	1	2
16200	1	3
16210	1	3
16220	1	3
16300	1	3
16400	1	2
17000	1	2
17100	1	3
17110	1	3
17120	1	3
17200	1	3
17300	1	2
18000	1	2
18100	1	3
18110	1	3
18120	1	3
18200	1	3
18300	1	2
18950	1	1
19950	1	0
20000	1	0
21000	1	0
22000	1	1
22100	1	2
22150	1	2
22160	1	3
22190	1	2
22200	1	2
22300	1	3
22400	1	3
22425	1	3
22500	1	2
22510	1	2
22550	1	3
22590	1	2
22600	1	2
22700	1	3
22750	1	3
22790	1	2
22950	1	2
22960	1	3
22970	1	3
23000	1	3
23050	1	4
23100	1	4
23200	1	4
23300	1	4
23400	1	4
23500	1	4
23600	1	4
23700	1	4
23750	1	4
23775	1	4
23800	1	4
23900	1	3
24000	1	3
24200	1	4
24300	1	4
24400	1	3
24500	1	2
25000	1	2
25100	1	3
25200	1	3
25300	1	3
25400	1	2
25995	1	1
30000	1	1
30100	1	1
30200	1	1
30400	1	1
30500	1	1
39950	1	1
40000	0	1
44000	0	2
44100	0	3
44200	0	3
44300	0	3
44500	0	2
45000	0	2
45100	0	3
45200	0	3
45300	0	3
45999	0	2
49950	0	1
50000	0	1
52000	0	2
52100	0	3
52300	0	3
52400	0	3
52999	0	2
54000	0	2
54100	0	3
54400	0	3
54500	0	3
54702	0	3
54703	0	3
54710	0	3
54800	0	3
54999	0	2
59950	0	1
59999	1	1
60000	0	1
61000	0	2
61100	0	3
61150	0	3
61200	0	3
61250	0	3
61300	0	3
61350	0	3
61360	0	3
61400	0	2
62000	0	2
62100	0	3
62200	0	3
62300	0	3
62400	0	3
62500	0	3
62600	0	3
62700	0	3
62800	0	3
62900	0	3
62950	0	2
64000	0	2
64100	0	3
64200	0	3
64300	0	3
64400	0	2
65000	0	2
65100	0	3
65200	0	3
65300	0	3
65400	0	2
65500	0	2
65600	0	3
65700	0	3
65800	0	3
65900	0	2
66000	0	2
66100	0	3
66200	0	3
66300	0	3
66400	0	2
67000	0	2
67100	0	3
67200	0	3
67300	0	3
67400	0	3
67500	0	3
67600	0	2
69950	0	1
69999	0	1
70000	0	1
70100	0	2
70200	0	2
70260	0	2
70300	0	2
70400	0	2
70500	0	2
70510	0	2
70520	0	2
79950	0	1
80000	0	1
80100	0	2
80200	0	2
80300	0	2
80400	0	2
80455	0	2
80460	0	2
80470	0	2
80600	0	1
80700	0	1
80800	0	2
80900	0	2
81000	0	2
81100	0	2
81200	0	2
81300	0	1
84000	0	1
84100	0	2
84200	0	2
84300	0	1
99495	0	1
99999	0	1

jdlong · January 23, 2019, 6:48pm

let's start by decomposing this:

(
   (
    SELECT
     MAX(a.[LedgerAccount])
    FROM
     [dbo].[DIM_LedgerAccounts] a
    WHERE
     a.[No_] < b.[No_]
     AND a.[Indentation] = b.[Indentation] - 1
     AND a.[Income_Balance] = b.[Income_Balance]
   )
   ,b.[LedgerAccount]
  )
FROM
[dbo].[DIM_LedgerAccounts] b

So this appears to be a cartesian product between DIM_LedgerAccounts and itself. So maybe you can use the same logic in dplyr:

tbl_la <- tbl(con, DIM_LedgerAccounts) %>%
    mutate(key = 1) # add a fake key for the cartesian join

tbl_la %>%
    full_join(tbl_la, by = "key") %>%
    filter(No_.x < No_.y &
           Indentation.x = Indentation.y - 1 &
           Income_Balance.x = Income_Balance.y) ->
result

I don't have any way of testing this since you didn't provide example data, but that sort of points you in the general right direction. The only magic is the cartesian join using a fake key. Everything else is just translating your SQL to dplyr. Since the full_join above results in repeated column names, the ones on the left get auto appended with .x while the second set get .y. That's where those funny suffixes come from.

system · February 13, 2019, 6:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.