top_n by group does not give expected result.

My code is as follows:

kazex5 <- kazex4 %>%
select(YEAR, Product, Export, Product2, Export2) %>%
group_by(YEAR, Product2) %>%
arrange(YEAR, -Export2, -Export) %>%
top_n(3, wt = Product2)

But, it does not give the desired slicing of data based on column Product2.
My data is as follows:

structure(list(YEAR = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L), Product = c("Petroleum oils, oils from bitumin. materials, crude",
"Copper", "Pig iron & spiegeleisen, sponge iron, powder & granu",
"Petroleum oils or bituminous minerals > 70 % oil", "Wheat (including spelt) and meslin, unmilled",
"Flat-rolled prod., iron, non-alloy steel, not coated", "Zinc",
"Radio-actives and associated materials", "Iron ore and concentrates",
"Natural gas, whether or not liquefied", "Flat-rolled prod., iron, non-alloy steel, coated, clad",
"Ores and concentrates of base metals, n.e.s.", "Aluminium ores and concentrates (incl. alumina)",
"Coal, whether or not pulverized, not agglomerated", "Liquefied propane and butane",
"Aircraft & associated equipment; spacecraft, etc.", "Gold, non-monetary (excluding gold ores and concentrates)",
"Leather", "Meal and flour of wheat and flour of meslin", "Ferrous waste, scrape; remelting ingots, iron, steel",
"Ingots, primary forms, of iron or steel; semi-finis.", "Silver, platinum, other metals of the platinum group",
"Residual petroleum products, n.e.s., related mater.", "Copper ores and concentrates; copper mattes, cemen",
"Lead", "Cotton", "Inorganic chemical elements, oxides & halogen salts",
"Miscellaneous no-ferrous base metals for metallur.", "Barley, unmilled",
"Ball or roller bearings", "Tubes, pipes & hollow profiles, fittings, iron, steel",
"Wool and other animal hair (incl. wool tops)", "Other crude minerals",
"Fish, fresh (live or dead), chilled or frozen", "Sulphur and unroasted iron pyrites",
"Ships, boats & floating structures", "Vegetables", "Manufactures of base metal, n.e.s.",
"Pigments, paints, varnishes and related materials", "Fruits and nuts (excluding oil nuts), fresh or dried",
"Electrical machinery & apparatus, n.e.s.", "Electric current",
"Metallic salts & peroxysalts, of inorganic acids", "Fertilizers (other than those of group 272)",
"Other inorganic chemicals", "Flat-rolled products of alloy steel",
"Tobacco, manufactured", "Petroleum gases, other gaseous hydrocarbons, n.e.s.",
"Civil engineering & contractors' plant & equipment", "Feeding stuff for animals (no unmilled cereals)"
), Export = c(28125875.547, 2774753.829, 1420761.85, 1372686.455,
1170506.955, 1030047.529, 1009023.697, 861280.967, 795650.888,
654776.104, 577914.585, 531876.982, 522493.466, 521783.504, 460815.451,
457163.028, 398898.941, 345676.251, 339222.406, 338749.019, 327587.704,
323202.372, 279972.276, 260366.854, 244857.889, 183686.003, 183079.171,
156656.264, 111366.105, 96351.945, 87312.682, 84810.818, 81225.326,
75315.252, 69763.087, 68704.15, 60793.747, 56920.456, 54593.508,
53949.055, 52598.13, 50108.263, 45667.071, 40731.426, 39928.831,
39790.575, 35430.679, 33444.216, 32675.558, 32171.988), Product2 = c("Petroleum",
"Copper", "Iron_Steel", "Petroleum", "Wheat_Flour", "Iron_Steel",
"Zinc", "Radioactives", "Iron_Steel", "Natural_Gas", "Iron_Steel",
"Ores_metals", "Ores_metals", "Coal", "LPG", "Machinery", "Ores_metals",
"Leather", "Wheat_Flour", "Iron_Steel", "Iron_Steel", "Jewellery_Bullion",
"Petroleum", "Ores_metals", "Lead", "Cotton", "Chemicals", "Miscellaneous",
"Barley", "Iron_Steel", "Machinery", "Meat_Animals", "Petroleum",
"Fish", "Iron_Steel", "Vehicles", "Vegetables", "Ores_metals",
"Chemicals", "Fruits", "Machinery", "Electric current", "Chemicals",
"Chemicals", "Chemicals", "Iron_Steel", "Tobacco", "Petroleum",
"Vehicles", "Meat_Animals"), Export2 = c(130828, 24150, 205281,
130828, 88585, 205281, 24150, 24150, 205281, 24150, 205281, 136835,
136835, 36220, 24150, 90553, 136835, 8034, 88585, 205281, 205281,
30199, 130828, 136835, 24150, 24150, 70402, 24150, 24150, 205281,
90553, 24138, 130828, 24150, 205281, 40248, 12075, 136835, 70402,
10059, 90553, 16105, 70402, 70402, 70402, 205281, 22142, 130828,
40248, 24138)), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))

Hi @ambijat . Can you please state what your expected result is? Do you want the top 3 items for each value in Product2? Which quantitative variable (Export or Export2) should be used as the selection criterion?

Ideally the output should be like this.

structure(list(YEAR = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L), Product = c("Petroleum oils, oils from bitumin. materials, crude",
"Copper", "Pig iron & spiegeleisen, sponge iron, powder & granu",
"Petroleum oils or bituminous minerals > 70 % oil", "Wheat (including spelt) and meslin, unmilled",
"Flat-rolled prod., iron, non-alloy steel, not coated", "Zinc",
"Radio-actives and associated materials", "Iron ore and concentrates",
"Natural gas, whether or not liquefied", "Flat-rolled prod., iron, non-alloy steel, coated, clad",
"Ores and concentrates of base metals, n.e.s.", "Aluminium ores and concentrates (incl. alumina)",
"Coal, whether or not pulverized, not agglomerated", "Liquefied propane and butane",
"Aircraft & associated equipment; spacecraft, etc.", "Gold, non-monetary (excluding gold ores and concentrates)",
"Leather", "Meal and flour of wheat and flour of meslin"), Export = c(28125875.547,
2774753.829, 1420761.85, 1372686.455, 1170506.955, 1030047.529,
1009023.697, 861280.967, 795650.888, 654776.104, 577914.585,
531876.982, 522493.466, 521783.504, 460815.451, 457163.028, 398898.941,
345676.251, 339222.406), Product2 = c("Petroleum", "Copper",
"Iron_Steel", "Petroleum", "Wheat_Flour", "Iron_Steel", "Zinc",
"Radioactives", "Iron_Steel", "Natural_Gas", "Iron_Steel", "Ores_metals",
"Ores_metals", "Coal", "LPG", "Machinery", "Ores_metals", "Leather",
"Wheat_Flour"), Export2 = c(130828, 24150, 205281, 130828, 88585,
205281, 24150, 24150, 205281, 24150, 205281, 136835, 136835,
36220, 24150, 90553, 136835, 8034, 88585)), row.names = c(NA,
-19L), class = c("tbl_df", "tbl", "data.frame"))

Since it is just for one year as entire 10 year data is not possible to dput here.

The Export2 corresponds to Product2, So to get top_n for top 3 Product2, according to my understanding should be Export2

Perhaps I'm misunderstanding you because your explanation doesn't align with the sample expected output that you've shared.

Each item in Product2 has the same value in Export2 so using the latter to find top_n doesn't make sense. On the other hand, you could find top_n by the Export variable like so:

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3

kazex4 <- structure(list(YEAR = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
                                  2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
                                  2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
                                  2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
                                  2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
                                  2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L), Product = c("Petroleum oils, oils from bitumin. materials, crude",
                                                                                                       "Copper", "Pig iron & spiegeleisen, sponge iron, powder & granu",
                                                                                                       "Petroleum oils or bituminous minerals > 70 % oil", "Wheat (including spelt) and meslin, unmilled",
                                                                                                       "Flat-rolled prod., iron, non-alloy steel, not coated", "Zinc",
                                                                                                       "Radio-actives and associated materials", "Iron ore and concentrates",
                                                                                                       "Natural gas, whether or not liquefied", "Flat-rolled prod., iron, non-alloy steel, coated, clad",
                                                                                                       "Ores and concentrates of base metals, n.e.s.", "Aluminium ores and concentrates (incl. alumina)",
                                                                                                       "Coal, whether or not pulverized, not agglomerated", "Liquefied propane and butane",
                                                                                                       "Aircraft & associated equipment; spacecraft, etc.", "Gold, non-monetary (excluding gold ores and concentrates)",
                                                                                                       "Leather", "Meal and flour of wheat and flour of meslin", "Ferrous waste, scrape; remelting ingots, iron, steel",
                                                                                                       "Ingots, primary forms, of iron or steel; semi-finis.", "Silver, platinum, other metals of the platinum group",
                                                                                                       "Residual petroleum products, n.e.s., related mater.", "Copper ores and concentrates; copper mattes, cemen",
                                                                                                       "Lead", "Cotton", "Inorganic chemical elements, oxides & halogen salts",
                                                                                                       "Miscellaneous no-ferrous base metals for metallur.", "Barley, unmilled",
                                                                                                       "Ball or roller bearings", "Tubes, pipes & hollow profiles, fittings, iron, steel",
                                                                                                       "Wool and other animal hair (incl. wool tops)", "Other crude minerals",
                                                                                                       "Fish, fresh (live or dead), chilled or frozen", "Sulphur and unroasted iron pyrites",
                                                                                                       "Ships, boats & floating structures", "Vegetables", "Manufactures of base metal, n.e.s.",
                                                                                                       "Pigments, paints, varnishes and related materials", "Fruits and nuts (excluding oil nuts), fresh or dried",
                                                                                                       "Electrical machinery & apparatus, n.e.s.", "Electric current",
                                                                                                       "Metallic salts & peroxysalts, of inorganic acids", "Fertilizers (other than those of group 272)",
                                                                                                       "Other inorganic chemicals", "Flat-rolled products of alloy steel",
                                                                                                       "Tobacco, manufactured", "Petroleum gases, other gaseous hydrocarbons, n.e.s.",
                                                                                                       "Civil engineering & contractors' plant & equipment", "Feeding stuff for animals (no unmilled cereals)"
                                  ), Export = c(28125875.547, 2774753.829, 1420761.85, 1372686.455,
                                                1170506.955, 1030047.529, 1009023.697, 861280.967, 795650.888,
                                                654776.104, 577914.585, 531876.982, 522493.466, 521783.504, 460815.451,
                                                457163.028, 398898.941, 345676.251, 339222.406, 338749.019, 327587.704,
                                                323202.372, 279972.276, 260366.854, 244857.889, 183686.003, 183079.171,
                                                156656.264, 111366.105, 96351.945, 87312.682, 84810.818, 81225.326,
                                                75315.252, 69763.087, 68704.15, 60793.747, 56920.456, 54593.508,
                                                53949.055, 52598.13, 50108.263, 45667.071, 40731.426, 39928.831,
                                                39790.575, 35430.679, 33444.216, 32675.558, 32171.988), Product2 = c("Petroleum",
                                                                                                                     "Copper", "Iron_Steel", "Petroleum", "Wheat_Flour", "Iron_Steel",
                                                                                                                     "Zinc", "Radioactives", "Iron_Steel", "Natural_Gas", "Iron_Steel",
                                                                                                                     "Ores_metals", "Ores_metals", "Coal", "LPG", "Machinery", "Ores_metals",
                                                                                                                     "Leather", "Wheat_Flour", "Iron_Steel", "Iron_Steel", "Jewellery_Bullion",
                                                                                                                     "Petroleum", "Ores_metals", "Lead", "Cotton", "Chemicals", "Miscellaneous",
                                                                                                                     "Barley", "Iron_Steel", "Machinery", "Meat_Animals", "Petroleum",
                                                                                                                     "Fish", "Iron_Steel", "Vehicles", "Vegetables", "Ores_metals",
                                                                                                                     "Chemicals", "Fruits", "Machinery", "Electric current", "Chemicals",
                                                                                                                     "Chemicals", "Chemicals", "Iron_Steel", "Tobacco", "Petroleum",
                                                                                                                     "Vehicles", "Meat_Animals"), Export2 = c(130828, 24150, 205281,
                                                                                                                                                              130828, 88585, 205281, 24150, 24150, 205281, 24150, 205281, 136835,
                                                                                                                                                              136835, 36220, 24150, 90553, 136835, 8034, 88585, 205281, 205281,
                                                                                                                                                              30199, 130828, 136835, 24150, 24150, 70402, 24150, 24150, 205281,
                                                                                                                                                              90553, 24138, 130828, 24150, 205281, 40248, 12075, 136835, 70402,
                                                                                                                                                              10059, 90553, 16105, 70402, 70402, 70402, 205281, 22142, 130828,
                                                                                                                                                              40248, 24138)), row.names = c(NA, -50L), class = c("tbl_df",
                                                                                                                                                                                                                 "tbl", "data.frame"))

kazex4 %>%
  group_by(YEAR, Product2) %>%
  top_n(3, wt = Export) %>% 
  arrange(Product2)
#> # A tibble: 38 x 5
#> # Groups:   YEAR, Product2 [25]
#>     YEAR Product                                    Export Product2      Export2
#>    <int> <chr>                                       <dbl> <chr>           <dbl>
#>  1  2007 Barley, unmilled                          111366. Barley          24150
#>  2  2007 Inorganic chemical elements, oxides & h~  183079. Chemicals       70402
#>  3  2007 Pigments, paints, varnishes and related~   54594. Chemicals       70402
#>  4  2007 Metallic salts & peroxysalts, of inorga~   45667. Chemicals       70402
#>  5  2007 Coal, whether or not pulverized, not ag~  521784. Coal            36220
#>  6  2007 Copper                                   2774754. Copper          24150
#>  7  2007 Cotton                                    183686. Cotton          24150
#>  8  2007 Electric current                           50108. Electric cur~   16105
#>  9  2007 Fish, fresh (live or dead), chilled or ~   75315. Fish            24150
#> 10  2007 Fruits and nuts (excluding oil nuts), f~   53949. Fruits          10059
#> # ... with 28 more rows

Created on 2020-04-08 by the reprex package (v0.3.0)

Is this what you are trying to achieve? Let me know if it isn't so that I can help you further.

Export will slice the data by 3 irrespective of any grouping however, I have found a way out. You can apply my code as below.

kazex5 <- kazex4 %>%
select(YEAR, Product, Export, Product2, Export2) %>%
group_by(YEAR, Product2, Product) %>%
arrange(YEAR, -Export2, -Export) %>%
mutate(Rank = dense_rank(Export2)) %>%
mutate(Order = as.integer(Rank-12)) %>%
filter(Order >= 1)%>%
select(YEAR, Product, Export, Product2, Export2, Order) %>%
ungroup()

This exactly what I want. And, it is like going the longer way to get the desired output.

That code results in a 0-row tibble for me. But if it solved your problem, that's great.

My output of kazex5 is follows:

structure(list(YEAR = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L), Product = c("Pig iron & spiegeleisen, sponge iron, powder & granu",
"Flat-rolled prod., iron, non-alloy steel, not coated", "Iron ore and concentrates",
"Flat-rolled prod., iron, non-alloy steel, coated, clad", "Ferrous waste, scrape; remelting ingots, iron, steel",
"Ingots, primary forms, of iron or steel; semi-finis.", "Ball or roller bearings",
"Sulphur and unroasted iron pyrites", "Flat-rolled products of alloy steel",
"Ores and concentrates of base metals, n.e.s.", "Aluminium ores and concentrates (incl. alumina)",
"Gold, non-monetary (excluding gold ores and concentrates)",
"Copper ores and concentrates; copper mattes, cemen", "Manufactures of base metal, n.e.s.",
"Petroleum oils, oils from bitumin. materials, crude", "Petroleum oils or bituminous minerals > 70 % oil",
"Residual petroleum products, n.e.s., related mater.", "Other crude minerals",
"Petroleum gases, other gaseous hydrocarbons, n.e.s.", "Aircraft & associated equipment; spacecraft, etc.",
"Tubes, pipes & hollow profiles, fittings, iron, steel", "Electrical machinery & apparatus, n.e.s.",
"Wheat (including spelt) and meslin, unmilled", "Meal and flour of wheat and flour of meslin",
"Inorganic chemical elements, oxides & halogen salts", "Pigments, paints, varnishes and related materials",
"Metallic salts & peroxysalts, of inorganic acids", "Fertilizers (other than those of group 272)",
"Other inorganic chemicals", "Ships, boats & floating structures",
"Civil engineering & contractors' plant & equipment", "Coal, whether or not pulverized, not agglomerated",
"Silver, platinum, other metals of the platinum group", "Copper",
"Zinc", "Radio-actives and associated materials", "Natural gas, whether or not liquefied",
"Liquefied propane and butane", "Lead", "Cotton", "Miscellaneous no-ferrous base metals for metallur.",
"Barley, unmilled", "Fish, fresh (live or dead), chilled or frozen"
), Export = c(1420761.85, 1030047.529, 795650.888, 577914.585,
338749.019, 327587.704, 96351.945, 69763.087, 39790.575, 531876.982,
522493.466, 398898.941, 260366.854, 56920.456, 28125875.547,
1372686.455, 279972.276, 81225.326, 33444.216, 457163.028, 87312.682,
52598.13, 1170506.955, 339222.406, 183079.171, 54593.508, 45667.071,
40731.426, 39928.831, 68704.15, 32675.558, 521783.504, 323202.372,
2774753.829, 1009023.697, 861280.967, 654776.104, 460815.451,
244857.889, 183686.003, 156656.264, 111366.105, 75315.252), Product2 = c("Iron_Steel",
"Iron_Steel", "Iron_Steel", "Iron_Steel", "Iron_Steel", "Iron_Steel",
"Iron_Steel", "Iron_Steel", "Iron_Steel", "Ores_metals", "Ores_metals",
"Ores_metals", "Ores_metals", "Ores_metals", "Petroleum", "Petroleum",
"Petroleum", "Petroleum", "Petroleum", "Machinery", "Machinery",
"Machinery", "Wheat_Flour", "Wheat_Flour", "Chemicals", "Chemicals",
"Chemicals", "Chemicals", "Chemicals", "Vehicles", "Vehicles",
"Coal", "Jewellery_Bullion", "Copper", "Zinc", "Radioactives",
"Natural_Gas", "LPG", "Lead", "Cotton", "Miscellaneous", "Barley",
"Fish"), Export2 = c(205281, 205281, 205281, 205281, 205281,
205281, 205281, 205281, 205281, 136835, 136835, 136835, 136835,
136835, 130828, 130828, 130828, 130828, 130828, 90553, 90553,
90553, 88585, 88585, 70402, 70402, 70402, 70402, 70402, 40248,
40248, 36220, 30199, 24150, 24150, 24150, 24150, 24150, 24150,
24150, 24150, 24150, 24150), Order = c(10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L, 8L, 8L, 8L, 8L, 8L, 7L,
7L, 7L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 3L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, -43L), class = c("tbl_df",
"tbl", "data.frame"))

Can you break your run at arrange and second mutate to see why it a 0-tibble for you.
And, I have given a re-run to the whole script just to ensure the results are the same and it is same.

There's no need to put in effort to debug why it doesn't work on my end. The objective is to get it working for you and we've done that. :slightly_smiling_face:

1 Like

Can we get Export2 as an aggregated Sum of Product2. I have tried this code.

kazex4$Export2 <- as.numeric(ave(kazex4$YEAR, kazex4$Product2, FUN=sum))

But there seems to be a fault in this. As, manual sampling tells me sums are not as per group Product2.

You could use summarize().

kazex4 %>% 
  group_by(Product2) %>% 
  summarize(Total_Export2 = sum(Export2))

# A tibble: 25 x 2
   Product2          Total_Export2
   <chr>                     <dbl>
 1 Barley                    24150
 2 Chemicals                352010
 3 Coal                      36220
 4 Copper                    24150
 5 Cotton                    24150
 6 Electric current          16105
 7 Fish                      24150
 8 Fruits                    10059
 9 Iron_Steel              1847529
10 Jewellery_Bullion         30199
# ... with 15 more rows

The it can work for only one year and besides I have to paste the Total_Export2 value against each Product value. In short it is to create a new column showing group sum of Product2 against each Product2 calcuated from 2 conditions of Year, Product2.

Not necessarily. You can add any number of variables into group_by() to get totals by each combination of those values. For example, if you want totals by YEAR and Product2, just change the call to:

kazex4 %>% 
  group_by(YEAR, Product2) %>% 
  summarize(Total_Export2 = sum(Export2))

# A tibble: 25 x 3
# Groups:   YEAR [1]
    YEAR Product2          Total_Export2
   <int> <chr>                     <dbl>
 1  2007 Barley                    24150
 2  2007 Chemicals                352010
 3  2007 Coal                      36220
 4  2007 Copper                    24150
 5  2007 Cotton                    24150
 6  2007 Electric current          16105
 7  2007 Fish                      24150
 8  2007 Fruits                    10059
 9  2007 Iron_Steel              1847529
10  2007 Jewellery_Bullion         30199
# ... with 15 more rows
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.