Subtract two columns from the csv file into another column, then find the 10 highest value rows based on the value just found

I have two columns starting_price and ending_price from the file CARS_1.csv. I want to create another column named price, it is made up of ending_price - starting_price. Then I want to get the top 10 rows with the highest price value that I just found. Help me, please

I made up some data

df <- data.frame(cbind(
starting_price = rnorm(10, mean=40000, sd=10000),
ending_price = rnorm(10, mean=45000, sd=15000)))
df$profit <- df$ending_price - df$starting_price
df[order(-df$profit),] # negative sign for descending order

Hi @Irenephunn ,

This would be a good use of the mutate function in dplyr package (or tidyverse).


# The CARS_1.csv dataframe you *probably* created with some dummy data
cars_data <- tibble(ending_price = c(1000, 1200, 3000, 5000, 4000, 2000, 1000, 5000, 4000, 3000, 2000),
                            starting_price = c(800, 1100, 2700, 4500, 800, 1100, 800, 4500, 800, 1100, 4000))
# Modify the cars data
cars_data <- cars_data %>% 
  mutate(price = ending_price - starting_price) %>% # create the price variable
  filter(rank(desc(price)) <= 10) %>% # filter by the top 10 prices
  arrange(desc(price)) # sort the final output by price high to low

I first introduce the cars_data with an example dataset, which hopefully looks somewhat similar to yours. Then modify the cars_data using the mutate() function. This generates the price variable by subtracting the ending and starting price.

Then if you only want the top 10 prices, you can use the filter function to filter out certain values. The values we want we put inside the filter function. So first we rank the price in descending order, which assigns a value of 1 to however long your list is, in this case mine was 11. Yours will likely be longer. Then the <=10 piece says only return the top 10 ranks.

FInally I used the arrange function to sort the final dataframe by price from high to low.

Hope this helps!

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.