Error in `[.data.frame`(Data, , 4) : undefined columns selected

OOPS! I have a colnames() problem with achen.

The code looks like it should read

achen  <-   cbind(new_inventory[1 ,], x)
colnames(achen) <- tolower(c("Index","City", "Country", "Lat", "Lon", "Station Elevation", 
                             "value_tmin","value_tmax","value_tavg","V1", "V2"))

Yes, it should be. But I thought that Other_ID was used everywhere:
write_sef(df, outpath = outpath,variable="ta", cod=inventory$Other_ID[index], nam=station_name, lat=lat,
lon=lon, alt=alt, sou="BE", units="C", stat="mean", period="month")

Thank you for looking at this!

There may be something superfluous here yes! :see_no_evil:

I don't understand what you mean here: Why do I need to check Achen..
I don't get an error in Achen, but in Milan..
And what is the tolower-command?

#library(SEF) https://github.com/C3S-Data-Rescue-Lot1-WP3/SEF/wiki#r-package
library(dataresqc)
outpath = "C:/Users/elinl/Documents"

inventory <- read.table("C:/Users/elinl/Documents/UniBe/Instrumentelle data/Inventories/Inventory_Berkeley-Earth.txt", colClasses="character", header = FALSE)

colnames(inventory) <- c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE", "Station_Elevation.m", "Start_Year", "End_Year" )

allfiles = list.files("C:/Users/elinl/Documents/Unibe/Instrumentelle data/Berkeley_Earth/Stations", full.names=TRUE)
for (currentfile in inventory$City) {

currentfile innholder nå navnet på filen som skal leses inn

her fyller du på med kommandoene som skal kjøres for hver fil

index <- grep(currentfile, allfiles)
#read in data -the x
x <- read.table(allfiles[index], header = FALSE, fill=TRUE, sep="\t",stringsAsFactors=FALSE, na.strings="-9999")

colnames(x) <- c("Year", "Month", "value_tavg")
year<- x[,1]
months<- as.integer(x[, 2])
value_tavg <- x[, 3]

#inventory <- read.csv("Inventory_Canada.txt", sep=";", stringsAsFactors=F)
index <- which(inventory$City==currentfile)
lat <- inventory$Lat.degN[index]
lon <- inventory$Lon.degE[index]
alt <- inventory$Station_Elevation.m[index]
station_name <- inventory$City[index]

df <- data.frame(y=year, m=months, d="NA",
hh=rep("",nrow(x)), mm=rep("",nrow(x)), value_tavg, stringsAsFactors=FALSE)

cod=inventory$City[index]

write_sef(df, outpath = outpath,variable="ta", cod=inventory$Other_ID[index], nam=station_name, lat=lat,
lon=lon, alt=alt, sou="BE", units="C", stat="mean", period="month")

first_year <- min(df$y)
last_year <- max(df$y)
file.rename(from=paste0(outpath,"/", list.files(path=outpath, pattern=as.character(inventory$Other_ID[index]))),
to=paste0(outpath,"/", paste("BE",inventory$Other_ID[index],station_name,first_year,last_year,"ta_monthly.tsv", sep="_")))
}

Check SEF

for (f in list.files(outpath,pattern="_ta",full.names=TRUE)) check_sef(f)

The error message is now: Error in file(file, "rt") : invalid 'description' argument
And it runs 12 stations at it stops. The next station for me is Milan. Do you get some other results?

Indeed!
I cannot find a lot of documentation on dataresqc and what there is is not that helpful.

Sorry I forgot to mention the tolower command. I just find it easier to work with lowercase names so

tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                       "Station_Elevation.m", "Start_Year", "End_Year" ))

gives us this.

[1] "other_id"            "city"                "modern_country"      "lat.degn"           
[5] "lon.dege"            "station_elevation.m" "start_year"          "end_year"     

Can you tell me what the three columns are in the allfiles directory files? It looks like they are---from left to right---'year', 'month', 'temperature' ?

I wanted to know if that was the layout that you wanted for the files. It is usually easier to walk through each command without the loop for debugging.

You can run Milan
with

x <- read.table(allfiles[23], header = FALSE, fill=TRUE, 
                  sep="\t",stringsAsFactors=FALSE, na.strings="-9999")

milan  <-   cbind(new_inventory[23 ,], x)
colnames(milan) <- tolower(c("Index","City", "Country", "Lat", "Lon", "Station Elevation", 
                             "value_tmin","value_tmax","value_tavg","V1", "V2"))

milan

to see if it looks okay. I have the rows in *new_inventory * in the same alphabetical order as the filesnames in allfiles so they be indexed by row number.

I don't see why you should be getting an error at "Milan" either but R error messages can be very obscure at times and may come from something much earlier.

When you say you are having a problem with "Milan" does that mean you are getting usable output files for other stations?

I, frankly, cannot see how your code will work.

The more I look at your code the more I think you need to spend some time on very basic R to get a feel for R syntax and data handing. In some cases you are trying to to impossible things and in others the syntax just does not work. I suspect that you are thinking tha R behaves like other computer languages and it does not. It does really crazy things at times.

Working from my simple example.

                  sep="\t",stringsAsFactors=FALSE, na.strings="-9999")

gives us a data.frame with 3 columns.

Your command

colnames(x) <- c("City", "Lat", "Lon", "Station Elevation", "Date", "value_tmin","value_tmax","value_tavg","other")
Error in names(x) <- value : 
  'names' attribute [9] must be the same length as the vector [3]

cannot work as it is trying to assign 9 names to three columns.

This cannot work properly

months<- as.numeric(substr(x$Date,5,6))

as there is no Date variable with a character length of 6 in data.frame x. If fact I cannot see how there could be a variable called Date in x because of the namen problem.
The only thing that looks like a date is Column 1 which is a 4 digit integer, for example "1763" as the first row in the Milan data. If there was a 4 digit Date
it would return "".

xx  <-  2020
 substr(xx,5 , 6) 
[1] ""

I am trying to decode the write_sef() statement at the moment and it is a bit confusing. To be honest a lot of the documentation is either confusing or non-existent. dataresqc looks to be a relatively new. I think it would be worth an email to the package author or maintainer to see if they can offer some pointers.

I noticed that the sample data Bern that is supplied with dataresqc actually is not a data.frame but is a list with 2 data.frames, ta and p
Do

str(Bern)

to see what I mean.

I am sorry to be of so little help.

Thank you @jrkrideau
Yes, it is little difficult for me because I don't have the R-language in my spine. I am meant to run lots of data (about 30,000 files) that will be deformed into a special format called SEF. It is my PhD thesis. It's hard for me because I'm not used to reading as much data as I do here now. This is climate data from the 18th century onwards. I read from different databases. This type here from Berkeley Earth is a little different from the others. By having the data vertically located.
The columns in the file are year, month and average temperature per month.

Yes, that is correct :ok_hand:

I spent the first week with R thinking that my brain was turning in my skull. :smiley:

Welcome to R

1 Like

I see, that's good! :+1:

Mistake earlier --- ** I have the rows in inventory in the same alphabetical order as the filesnames in allfiles* That should be **new_inventory **. Will edit earlier message.

This is in the inventory. The file which is information about the station. The inventory has 8 columns.

The other files (x) has 3 columns

Thank you. I have checked the file thousand times...

I am sharing office with him in Bern. I am now sitting in Oslo. And I am stuck here. And I have ask him thousand times. And he is not so communicative. (sorry to say)

Can you recommend a course for me?

Here is little more info about this: https://c3s-data-rescue-lot1-wp3.github.io/SEF/

Here it is both (ta = temperature and p = air pressure). That is meteorological codes. I run here first the temperature. Data frame for temperature (ta)

Ouch! Bribery? Torture? Both?

Not really, I started long enough ago that there were no courses so I don't really know about what is available today.

I think I would recommend Learn the tidyverse or some of the tutorials available at the R site. Have a look at the Manuals and Contributed materials on https://cran.r-project.org/

Note that the tidyverse is a special subset of the overall R world but very useful and some of the packages there make things a lot easier to do. For example I used the arrange() command from the dplyr package in tidyverse as it is a lot easier to use than the order command in base R

The key thing to do is get a feeling for data structures. Are you dealing with a vector, a matrix, a data frame , etc....? Also you need to know the data format, numeric(various flavours), character, factor?

It may just be me be but I find it is very useful to use the str command a lot.
For example str(x) in my example will show you that you have

'data.frame':	2951 obs. of  3 variables:
 $ V1: int  1763 1763 1763 1763 1763 1763 1763 1763 1763 1763 ...
 $ V2: int  1 2 3 4 5 6 7 8 9 10 ...
 $ V3: num  -0.919 5.618 7.203 12.953 15.574 ...

so you have a data.frame with three numeric column.

Ah, obvious once someone tells me, I am not a climatologist. So in your current case the write_sef() gives you a one dimensional list with and data.frame. Sounds fine to me.

For some reason I missed that one. I'll have a look. Thanks

That link to the example has helped a lot but I have run into a problem. If you have a look at https://rdrr.io/cran/dataresqc/man/write_sef.html
it says : Data: A data frame with 6 variables in this order: year, month, day, hour, minute, value.

After I got everything set to run write_sef on my test data.frame achen it bombed because I could only supply data for 3 variables, year, month and value.

I stuck together a tiny data.frame with 6 variables and write_sef and ran successfully. It looks like "day", "hour" and "minute" are required.

From a programming point of view, I do not think this is a problem. We can just set the date to be something like "year-month-01 07:00" but what does that do to the data?

In any case here is the mess I have so far. I am pulling all the information such an Name , Lat & Lon from the Achen file but using the dat1 file to generate the rest.


dat1  <-  structure(list(Year = c(1816L, 1816L, 1816L, 1816L, 1816L, 1816L
), Month = c(1L, 1L, 1L, 1L, 1L, 1L), Day = c(1L, 1L, 1L, 2L, 
2L, 2L), Hour = c(7L, 13L, 16L, 7L, 13L, 16L), Minute = c(30L, 
0L, 36L, 30L, 0L, 36L), Value = c(14, 30, 24.5, 17, 24.5, 20.5
)), class = "data.frame", row.names = c(NA, -6L))


setwd("/home/john/RJunk/elinlun")


library(tidyverse)
library(dataresqc)

inventory <-  read.csv("Inventory_Berkeley_Earth.csv", sep = "\t", header = FALSE)

colnames(inventory) <- tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                         "Station_Elevation.m", "Start_Year", "End_Year" ))

new_inventory  <-  arrange(inventory, city)

allfiles = (list.files("allfiles", full.names = TRUE))

outpath  <-  "/home/john/RJunk/elinlun/output"
variable = "ta"
lat  <-  new_inventory[1,][4]
lon  <-  new_inventory[1,][5]
alt  <-  new_inventory[1,][6]
cod  <-  new_inventory[1,][1]
id   <-  new_inventory[1,][1]
nam  <-  new_inventory[1,][2]
units  <-  "C"
sou  <-  "Berkley"


write_sef(dat1, outpath = outpath, variable = variable, cod = cod,
          nam = nam, lat = lat, lon = lon, alt = alt, stat = "point", units = "C")

Dear John,

Thank you (I am sorry for late answer. I had the weekend off).

But this sentence make any sence when I want d = NA (as integer (1-31)), hour = NA (as integer(0-24)), Minute = NA (as integer(0-59)):
df <- data.frame(y=year, m=months, d="NA",
hh=rep("",nrow(x)), mm=rep("",nrow(x)), value_tavg, stringsAsFactors=FALSE)

I have only one temperature per months (mean temperature)
=> value_tavg <- x[, 3]

value_tavg <- x[, 3]
Elin

Thank you for feedback here.I have bought a couple R books now. So maybe it will help me. I have also ask for a course.
I am not sure what I am dealing with?
The Achen file is listed the monthly mean data in a vertical format.

1829 6 17.7
1829 7 16.6
1829 8 16
1829 9 13.6
1829 10 9.3
1829 11 3.7
1829 12 -3.3
Year month value_tavg (=temperature average or mean)

Is that the vector?

This will also be the dataframe (I think).