R Program - Error - Kindly guide me to fix the same

SBJShree · December 14, 2024, 11:40am

Dear Experts,

I am learning R programming. I need your inputs to fix the error which I'm getting in one of my R program. Request your help to resolve it. Thanks in advance

Question:

a) In 5.R, load the air tibble from air.RData with load. Transform the tibble so that it includes the single row with the highest value in the emissions column for each county.
b) Save the resulting air tibble, using save, in a file called 5.RData.
c) Executing 5.R should create a tibble named air with 36 rows and 8 columns, sorted from highest to lowest value in the emissions column

My code:

Load the air data

load("/workspaces/xxxxx/air/air.RData")
air$emissions <- suppressWarnings(as.numeric(as.character (air$emissions)))

Ensure the air tibble exists after loading the data

if (!exists("air")) {
stop("The air tibble does not exist!")
}

library(dplyr)
air <- as_tibble(air) %>%
group_by(county) %>%

#For each county, select the row with the maximum emissions

slice (which.max(emissions)) %>%
arrange (desc (emissions)) %>%
ungroup()

Save the resulting air tibble in 5.RData

save (air, file = "5.RData")

Error:
( 5. RData contains air tibble with largest pollutant source for each county
air tibble does not contain highest emissions for each county

jrkrideau · December 14, 2024, 2:28pm

Hi, welcome to the forum.

Your code is really hard to read. It is much better to copy the code and paste it here between

This gives us formatted code that we can copy, paste and run . Often a person here does not have the time to type out code to test it and find a problem.

That said, I think there is a space here between air and RData.

 load("/workspaces/xxxxx/air/air. RData")

You may find this helpful in asking questions here..

FAQ: How to do a minimal reproducible example ( reprex ) for beginners General

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.2…

SBJShree · December 14, 2024, 3:27pm

Thank you very much for your valuable time and reply. I have copy pasted my code in my original post. Also attached my output screen shot.

air.RData file has more than 32,000 records. My code filtered 36 relevant records with 8 columns out of total 32,000 records. However R throws error. I tried my level best to solve it but in vain.
Kindly help me resolve this issue. Thank you very much.

mduvekot · December 14, 2024, 4:00pm

What is the error message?

jrkrideau · December 14, 2024, 4:07pm

Thanks, it is much easier to read your code now. However there in an even better way to do it. I thought I had pasted in the instructions but it looks like I messed up the formatting. My apologies.

The best way to provide code is to copy it and paste it between
```

```

I think your code sequence is not optimal.

Let's try this bit of rearranged code and see what happens.

# Load packages. I am loading tidyverse. {dyplr} is part of it.-----------
library(tidyverse)

# Load data ---------------------------------------------------------------
load("/workspaces/xxxxx/air/air.RData")

# Check to see if  air data.frame/tibble  exists and check `emmissions structure ---
air
str(air$emissions)

IS that dataset you are using publicly available? If so I might download it.

jrkrideau · December 14, 2024, 4:57pm

I think it is

RData contains air tibble with largest pollutant source for each county
air tibble does not contain highest emissions for each county

mduvekot · December 14, 2024, 5:19pm

It's from here, it seems: Northwest Air - CSCI E-5a

SBJShree · December 14, 2024, 5:26pm

yes I believe. I am not sure. I can provide the link below:

My code is for 5.R problem set.

jrkrideau · December 14, 2024, 5:32pm

Thank to both @ mduvekot and @ SBJShree

It looks like everything downloaded nicely.

jrkrideau · December 14, 2024, 5:51pm

@ SBJShree

You have your data in air.RData. Have you modified the variable names at all or done other manipulations to the the existing air.csv file. The original air.csv looks like one of those blasted Excel exports.

Just look at the variable names.

1:              State
 2:       State-County
 3:          POLLUTANT
 4:   Emissions (Tons)
 5:     Pollutant Type
 6:           SCC Code
 7:         EIS Sector
 8: Source Description
 9:        SCC LEVEL 1
10:        SCC LEVEL 2
11:        SCC LEVEL 3
12:        SCC LEVEL 4
13:         EPA Region
14:               FIPS

SBJShree · December 14, 2024, 6:13pm

No. I haven't modified the variable names at all . I created a tibble named 'air' itself and did the modifications.

Initially I tried to name the tibble as 'Highest-emissions'. However checkcs50 thrown me an error saying

"air.RData does not exist"

To avoid this error I gave the same name to the tibble as 'air' itself.

mduvekot · December 14, 2024, 6:38pm

what does colnames(air) return?

jrkrideau · December 14, 2024, 7:19pm

Thanks, I am starting to see the problem. Somehow you have confused something inthe data reading & saving . Can you post the code you were using to read in the data and "save" it. We might find where the problem is.

The first thing to note in my list of names above is that there is no variable "emissions" . It looks like it is "Emissions (Tons)"

We need to clean up those horrible variable names and then save a new file. As far as I am concerned you do not want a RData file. Just keep everything as a .csv file. It's alot simpler and IMHO RData files are an anachronism.

ACH. I may be a while getting bark to you. Your project folder seems to have gotten corrupted I may have to rebuiltit.

jrkrideau · December 14, 2024, 8:55pm

I am still not sure what was happening but I think I have something that should help a bit. I am just getting you to the point where you have usable data. Remember though, there is no variable "emissions"; it is "emissions_tons".

# Load packages -----------------------------------------------------------
suppressMessages(library(data.table))
suppressMessages(library(tidyverse))
library(janitor)
library(here)

# Import data and tidy up name---------------------------------------------------------
dat1 <- read_csv("./raw_data/air.csv")
names(dat1)
dat1 <- dat1  %>%  clean_names()
names(dat1)

# Save to new file at main project level file  ----------------------------
write_csv(dat1, "air-rev.csv")

# Start what you were trying to do ----------------------------------------

air <- read_csv("air-rev.csv")

mduvekot · December 14, 2024, 10:05pm

remember that the instruction say to use

state, renamed from State
county, renamed from State-County
pollutant, renamed from POLLUTANT
emissions, renamed from Emissions (Tons)
level_1, renamed from SCC LEVEL 1
level_2, renamed from SCC LEVEL 2
level_3, renamed from SCC LEVEL 3
level_4, renamed from SCC LEVEL 4

you can use colnames()

colnames(air) <- c("state", "county", "pollutant", "emissions", paste0("level_", 1:4))

or rename()

air <- air %>%  dplyr::rename(c(state = "State",
                       county = "State-County",
                       pollutant = "POLLUTANT",
                       emissions = "Emissions (Tons)",
                       level_1 = "SCC LEVEL 1",
                       level_2 = "SCC LEVEL 2",
                       level_3 = "SCC LEVEL 3",
                       level_4 = "SCC LEVEL 4"))

jrkrideau · December 14, 2024, 11:01pm

Maybe I should have read the instructions ?

SBJShree · December 15, 2024, 3:57pm

Could any of R experts kindly let me know what mistake have I done to get this error ? Thanks

jrkrideau · December 15, 2024, 4:23pm

First of all we don't know for sure if you actually loaded the data. It looks like you did not but it's not completely clear. The message

"air.RData does not exist"

seems to indicate that you did not. Or you did successfully load the data but air.RData would not be the name of your dataset. It is the name of the storage file where the data is.

With out that code I don't think we can tell.

Second, if you did load the data all right, since you did not recode the variable names there was no variable "emissions" in your data set. That's what @ mduvekot was talking about.

system · March 15, 2025, 4:24pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.