Error in `[.data.frame`(Data, , 4) : undefined columns selected

jrkrideau · February 1, 2021, 2:26pm

I took most of the weekend off but I am shocked! Ph.D students are allowed to take a weekend off?

I have also ask for a course.
There are some on-line courses,. "Internet courses R Statistics" brings up several but I am afraid I do not know anything about them.

If you can get one with a live instructor you probably would be better off.

I am going to have to think a bit about this code but I think you are correct in general though the syntax looks wrong.

d = NA (as integer (1-31)), hour = NA (as integer(0-24)), 
Minute = NA (as integer(0-59)):
df <- data.frame(y=year, m=months, d="NA",
hh=rep("",nrow(x)), mm=rep("",nrow(x)), 
value_tavg, stringsAsFactors=FALSE)

The Achen file is listed the monthly mean data in a vertical format.

Is that the vector?

Achen is the data.frame. Each column of the data.frame is a vector.

So if we say

achen  <-  read.csv("Achen.txt", sep = "\t")
names(achen)  <-  c("year", "month", "value_tavg")

We can then say

temperature  <-  achen$month
temperature

which gives us a vector with the "value_tavg" numbers.

A quick internet search turns up a couple of sites that may help you understand data structures.

Data structures · Advanced R.

Data Structures in R Programming - Types and Syntax

Just in case you encounter one, a tibble is basically a data.frame for our uses. It's complicated (Nobody said R was rationale by programming standards.

I'll have a look at your NA approach which sounds good and see if the write_sef likes it. IF so I think we are in good shape.

jrkrideau · February 1, 2021, 3:25pm

I think we may have it. Well, partly.

If you change the file paths to yours rather than mine, I think we have the first file in SEF format. I have set day, hour, and minute all to NA as it is better practice in R. I hope it is okay in SEF format.

I have not figured out how to set up the naming conventions for the files yet.

Duh, maybe I should include the code?

setwd("/home/john/RJunk/elinlun")


library(tidyverse)
library(dataresqc)

inventory <-  read.csv("Inventory_Berkeley_Earth.csv", sep = "\t", header = FALSE)

colnames(inventory) <- tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                         "Station_Elevation.m", "Start_Year", "End_Year" ))
allfiles = list.files("/home/john/RJunk/elinlun/allfiles", full.names=TRUE)

new_inventory  <-  arrange(inventory, city)  

dat1 <-  read.csv(allfiles[1], sep = "\t", header = FALSE)
names(dat1)  <-  c("year", "month", "value")

nought  <-  data.frame(day = NA, hour = NA , minute = NA)

dat1  <-  cbind(dat1, nought)

dat1  <-  dat1[, c("year", "month", "day", "hour", "minute", "value")]


outpath  <-  "/home/john/RJunk/elinlun/output"
variable = "ta"
lat  <-  new_inventory[1,][4]
lon  <-  new_inventory[1,][5]
alt  <-  new_inventory[1,][6]
cod  <-  new_inventory[1,][1]
id   <-  new_inventory[1,][1]
nam  <-  new_inventory[1,][2]
units  <-  "C"
sou  <-  "Berkley"



write_sef(dat1, outpath = outpath, variable = variable, cod = cod,
          nam = nam, lat = lat, lon = lon, alt = alt, stat = "point", units = "C")

elinlun · February 1, 2021, 5:13pm

jrkrideau:

setwd("/home/john/RJunk/elinlun")


library(tidyverse)
library(dataresqc)

inventory <-  read.csv("Inventory_Berkeley_Earth.csv", sep = "\t", header = FALSE)

colnames(inventory) <- tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                         "Station_Elevation.m", "Start_Year", "End_Year" ))
allfiles = list.files("/home/john/RJunk/elinlun/allfiles", full.names=TRUE)

new_inventory  <-  arrange(inventory, city)  

dat1 <-  read.csv(allfiles[1], sep = "\t", header = FALSE)
names(dat1)  <-  c("year", "month", "value")

nought  <-  data.frame(day = NA, hour = NA , minute = NA)

dat1  <-  cbind(dat1, nought)

dat1  <-  dat1[, c("year", "month", "day", "hour", "minute", "value")]


outpath  <-  "/home/john/RJunk/elinlun/output"
variable = "ta"
lat  <-  new_inventory[1,][4]
lon  <-  new_inventory[1,][5]
alt  <-  new_inventory[1,][6]
cod  <-  new_inventory[1,][1]
id   <-  new_inventory[1,][1]
nam  <-  new_inventory[1,][2]
units  <-  "C"
sou  <-  "Berkley"



write_sef(dat1, outpath = outpath, variable = variable, cod = cod,
          nam = nam, lat = lat, lon = lon, alt = alt, stat = "point", units = "C")

Thank you so much!
In Norwegian we say, thousand thank you!
I will work with these this evening.
The first is stops with:
inventory <- read.csv("C:/Users/elinl/Documents/UniBe/Instrumentelle data/Inventories/Inventory_Berkeley-Earth.txt", sep = "\t", header = FALSE)
csv - I use the txt-file ok. I don't know why but it always stops with csv..

I run now - I will check the rest. I have to change the wd..

Elin

jrkrideau · February 1, 2021, 5:44pm

I do not think that is problem. I think it is because we have no time in the file.

Oh, you withdrew the post as I was typing!

I have to go out for a while but should be able to have another look later today,

John

jrkrideau · February 1, 2021, 8:36pm

How does this look? You will need to change the paths in setwd, outpath and in the list.files command to your paths.

The download Berkeley_Earth.zip contains 57 files, 56 data files and a copy of the inventory file Inventory_Berkeley-Earth.txt which needs to be removed. I have the equivalent file Inventory_Berkeley-Earth.csv in *"/home/john/RJunk/elinlun" and all the data files in "/home/john/RJunk/elinlun/allfiles"

This seems to run. We can add more metadata as needed.

setwd("/home/john/RJunk/elinlun")


library(tidyverse)
library(dataresqc)

inventory <-  read.csv("Inventory_Berkeley_Earth.csv", sep = "\t", header = FALSE)

colnames(inventory) <- tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                         "Station_Elevation.m", "Start_Year", "End_Year" ))
allfiles = list.files("/home/john/RJunk/elinlun/allfiles", full.names=TRUE)

new_inventory  <-  arrange(inventory, city)  

for(i in 1:nrow(new_inventory)){
dat1 <-  read.csv(allfiles[i], sep = "\t", header = FALSE)
names(dat1)  <-  c("year", "month", "value")

nought  <-  data.frame(day = NA, hour = NA , minute = NA)

dat1  <-  cbind(dat1, nought)

dat1  <-  dat1[, c("year", "month", "day", "hour", "minute", "value")]


outpath  <-  "/home/john/RJunk/elinlun/output"
variable = "ta"
lat  <-  new_inventory[i,][4]
lon  <-  new_inventory[i,][5]
alt  <-  new_inventory[i,][6]
cod  <-  new_inventory[i,][1]
id   <-  new_inventory[i,][1]
nam  <-  new_inventory[i,][2]
stat = "point"
units  <-  "C"
sou  <-  "Bk"



write_sef(dat1, outpath = outpath, variable = variable, cod = cod,
          nam = nam, lat = lat, lon = lon, alt = alt, sou = sou, stat = stat, units = "C")
}

elinlun · February 2, 2021, 12:39pm

Hi again John,

Sorry late (yesterday was a mess). But anyway. I think it is ok. But I have attached 2 files (jpg)

The Achen one is the one you help me with. The York one is the one I have done before it crashed.
The main point is to have all the metainformation about the station or the location the meteorological observation in the SEF-file (from the inventory)!
So just some issues here:
1.
I dont think we need the "NA"-code for the Hour and Minutes data. So that will just be open (no data here).
Is that easy to fix?
I wrote: hh=rep("",nrow(x)) and mm=rep("",nrow(x)) in the old version. That is because we don't use hour or minutes data here in this model I will use, just monthly data. The daily data can be used sometimes in the model. The main point is that I don't want anything for the hour and minute column.

The name giving of the file is important:
It is done with this script in the old code:
first_year <- min(df$y)
last_year <- max(df$y)
file.rename(from=paste0(outpath,"/", list.files(path=outpath, pattern=as.character(inventory$Other_ID[index]))),
to=paste0(outpath,"/", paste("BE",inventory$Other_ID[index],station_name,first_year,last_year,"ta_monthly.tsv", sep="_")))
Just look at the difference between the attn. files.

It is the loop. Is it easy to make a loop with this new script?
In the dropbox-link I had 57 stations ( i think) but actually it is 1759 stations. That need to be reformated in to this SEF-format. Do you think tHt is easy with the new script?

Thank you again John for answer.

Elin

elinlun · February 2, 2021, 12:44pm

Yes, This is correct. Actually it is 1759 files.

What is this new here?

Your new version ran. Good!! But it was something wrong with the namegiving of the files and if it is possible to make all the stations in one run..!

Great work John!
I own you a - some thing

Elin

elinlun · February 2, 2021, 1:01pm

Is it because write_sef I get this error message?

In write_sef(dat1, outpath = outpath, variable = variable, cod = cod, :
Period forced to 0 because of 'stat'

elinlun · February 2, 2021, 1:08pm

Stat = mean
Source = Berkeley_Earth

jrkrideau · February 2, 2021, 1:48pm

Replying in general to your last three posts.

I think so. I used NA because in R that indicates a real non-existance of data where " " may have a meaning depending on the data being analysed .

About the name of the file. I am just beginning to understand how the write_sef function In my last attempt, I got "Bk_10682_177410-200507_ta.tsv" for Achen where *sou = "Bk". The write_sef automatically calculates the first_year and last_year.

Yes but but one was the inventory file Inventory_Berkeley-Earth.txt. I would check your 1759 files. You may only have 1758 data files and that Inventory_Berkeley-Earth.txt file which I do not think you want there.

I have made the changes to stat and sou and this is the file name I have so far
Berkeley_Earth_10682_177410-200507_ta.tsv

I had not understood that code in your Point 2 was a brute force method of assigning a file name. I will have to think about how to get the Station_Name and the term "monthly" into the file name.

elinlun · February 2, 2021, 1:55pm

I actually don't understand why this was deleted. Did you write anything here? Was it anything important I missed here. Strange. Maybe my fingertips just did something strange. I have no clue. Sorry for deleting. I need all this conversation for learning! Thanks for using time on me and my scarse R-knowledge

elinlun · February 2, 2021, 3:12pm

Thanks for checking that

jrkrideau · February 2, 2021, 3:33pm

At the moment, it looks like the write_sef" function is set up to provide either the "other_id or the city (station name) but not both.

So we can get
Berkeley_Earth_10682_177410-200507_ta_monthly.tsv
or
Berkeley_Earth_Aachen_182906-201107_ta_monthly.tsv
but not both and if we get Berkeley_Earth_Aachen_182906-201107_ta_monthly.tsv the ID in the file becomes Achen and not 155137.

Change *cod <- new_inventory[i,][1]*to cod <- new_inventory[i,][2] to see what I mean.

I am trying hack the write_sef function but while I can get both a Berkeley_Earth_10682_177410-200507_ta_monthly.tsv and a Berkeley_Earth_13013_Praha_Klementinum_177501-200504_ta_monthly.tsv output' I cannot just get Berkeley_Earth_13013_Praha_Klementinum_177501-200504_ta_monthly.tsv

So

jrkrideau · February 2, 2021, 3:41pm

OOPS, forgot
Here is the most recent working code with "monthly" in the filu name and day, hour, minute set to " ".


setwd("/home/john/RJunk/elinlun")


library(tidyverse)
library(dataresqc)

inventory <-  read.csv("Inventory_Berkeley_Earth.csv", sep = "\t", header = FALSE)

colnames(inventory) <- tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                         "Station_Elevation.m", "Start_Year", "End_Year" ))
allfiles = list.files("/home/john/RJunk/elinlun/allfiles", full.names=TRUE)

new_inventory  <-  arrange(inventory, city)  

for(i in 1:nrow(new_inventory)){
  print(i)
dat1 <-  read.csv(allfiles[i], sep = "\t", header = FALSE)
names(dat1)  <-  c("year", "month", "value")

nought  <-  data.frame(day = "", hour = "" , minute = "")

dat1  <-  cbind(dat1, nought)

dat1  <-  dat1[, c("year", "month", "day", "hour", "minute", "value")]


outpath  <-  "/home/john/RJunk/elinlun/output"
variable = "ta"
lat  <-  new_inventory[i,][4]
lon  <-  new_inventory[i,][5]
alt  <-  new_inventory[i,][6]
cod  <-  new_inventory[i,][1]
nam  <-  new_inventory[i,][2]
stat = "mean"
units  <-  "C"
sou  <-  "Berkeley_Earth"
note = "monthly"


write_sef(dat1, outpath = outpath, variable = variable, cod = cod, nam = nam,
           lat = lat, lon = lon, alt = alt, sou = sou, stat = stat, units = "C", note = note)
}

elinlun · February 2, 2021, 3:44pm

Super! I will try it!

elinlun · February 2, 2021, 3:54pm

I ran it! It run!! Fantastic!!

Now 56 beautiful files!
But I also want the name in the file name.
Except that everything is perfect !!

elinlun · February 2, 2021, 4:03pm

What is this message mean?
It said 'missing argument to function call'
But it ran, so it is not so important?

elinlun · February 2, 2021, 4:17pm

Berkeley_Earth_Aachen_182906-201107_ta_monthly.tsv is good!

jrkrideau · February 2, 2021, 4:59pm

Actually no because we lose the ID in the file. I think I have a simple hack of the write_sef function that gives us what we want.
I simply changed the line

filename <- paste(sou, cod, dates, variable, sep = "_")

to

filename <- paste(sou, cod, nam, dates, variable, sep = "_")

and saved the function under the name play and it seems to work,

Paste the function below into R and

**New Function play. **

play  <-  function (Data, outpath, variable, cod, nam = "", lat = "", lon = "", 
    alt = "", sou = "", link = "", units, stat, metaHead = "", 
    meta = "", period = "", time_offset = 0, note = "", keep_na = FALSE, 
    outfile = NA) 
{
    for (i in 1:ncol(Data)) Data[, i] <- as.character(Data[, 
        i])
    header <- array(dim = c(12, 2), data = "")
    header[1, ] <- c("SEF", "1.0.0")
    header[2, ] <- c("ID", trimws(as.character(cod)))
    header[3, ] <- c("Name", trimws(as.character(nam)))
    header[4, ] <- c("Lat", trimws(as.character(lat)))
    header[5, ] <- c("Lon", trimws(as.character(lon)))
    header[6, ] <- c("Alt", trimws(as.character(alt)))
    header[7, ] <- c("Source", trimws(as.character(sou)))
    header[8, ] <- c("Link", trimws(as.character(link)))
    header[9, ] <- c("Vbl", trimws(as.character(variable)))
    header[10, ] <- c("Stat", trimws(as.character(stat)))
    header[11, ] <- c("Units", trimws(as.character(units)))
    header[12, ] <- c("Meta", trimws(as.character(metaHead)))
    if (stat == "point" & !all(as.character(period) == "0")) {
        period <- "0"
        warning("Period forced to 0 because of 'stat'")
    }
    if (!all(time_offset == 0) & !all(is.na(as.integer(Data[, 
        4]) + as.integer(Data[, 5])))) {
        times <- ISOdate(Data[, 1], Data[, 2], Data[, 3], Data[, 
            4], Data[, 5])
        times <- times - time_offset * 3600
        Data[which(!is.na(times)), 1] <- as.integer(substr(times[which(!is.na(times))], 
            1, 4))
        Data[which(!is.na(times)), 2] <- as.integer(substr(times[which(!is.na(times))], 
            6, 7))
        Data[which(!is.na(times)), 3] <- as.integer(substr(times[which(!is.na(times))], 
            9, 10))
        Data[which(!is.na(times)), 4] <- as.integer(substr(times[which(!is.na(times))], 
            12, 13))
        Data[which(!is.na(times)), 5] <- as.integer(substr(times[which(!is.na(times))], 
            15, 16))
    }
    DataNew <- data.frame(Year = Data[, 1], Month = Data[, 2], 
        Day = Data[, 3], Hour = Data[, 4], Minute = Data[, 5], 
        Period = as.character(period), Value = Data[, 6], Meta = as.character(meta), 
        stringsAsFactors = FALSE)
    if (!keep_na) 
        DataNew <- DataNew[which(!is.na(DataNew$Value)), ]
    if (substr(outpath, nchar(outpath), nchar(outpath)) != "/") {
        outpath <- paste0(outpath, "/")
    }
    if (is.na(outfile)) {
        j <- 3
        if (is.na(as.integer(DataNew[1, 3]))) 
            j <- 2
        if (is.na(as.integer(DataNew[1, 2]))) 
            j <- 1
        datemin <- paste(formatC(unlist(as.integer(DataNew[1, 
            1:j])), width = 2, flag = 0), collapse = "")
        datemax <- paste(formatC(unlist(as.integer(DataNew[dim(DataNew)[1], 
            1:j])), width = 2, flag = 0), collapse = "")
        dates <- paste(datemin, datemax, sep = "-")
        filename <- paste(sou, cod, nam, dates, variable, sep = "_")
        if (sou %in% c(NA, "")) 
            filename <- sub("_", "", filename)
        if (note != "") {
            note <- paste0("_", gsub(" ", "_", note))
        }
        filename <- gsub(" ", "", filename)
        filename <- paste0(outpath, filename, note, ".tsv")
    }
    else {
        filename <- paste0(outpath, outfile)
        if (substr(filename, nchar(filename) - 3, nchar(filename)) != 
            ".tsv") {
            filename <- paste0(filename, ".tsv")
        }
    }
    write.table(header, file = filename, quote = FALSE, row.names = FALSE, 
        col.names = FALSE, sep = "\t", dec = ".", fileEncoding = "UTF-8")
    write.table(t(names(DataNew)), file = filename, quote = FALSE, 
        row.names = FALSE, col.names = FALSE, sep = "\t", fileEncoding = "UTF-8", 
        append = TRUE)
    write.table(DataNew, file = filename, quote = FALSE, row.names = FALSE, 
        col.names = FALSE, sep = "\t", dec = ".", fileEncoding = "UTF-8", 
        append = TRUE)
    message(paste("Data written to file", filename))
}

and run this slightly revised code.

setwd("/home/john/RJunk/elinlun")


library(tidyverse)
library(dataresqc)

inventory <-  read.csv("Inventory_Berkeley_Earth.csv", sep = "\t", header = FALSE)

colnames(inventory) <- tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                         "Station_Elevation.m", "Start_Year", "End_Year" ))
allfiles = list.files("/home/john/RJunk/elinlun/allfiles", full.names=TRUE)

new_inventory  <-  arrange(inventory, city)  

for(i in 1:nrow(new_inventory)){
  print(i)
dat1 <-  read.csv(allfiles[i], sep = "\t", header = FALSE)
names(dat1)  <-  c("year", "month", "value")

nought  <-  data.frame(day = "", hour = "" , minute = "")

dat1  <-  cbind(dat1, nought)

dat1  <-  dat1[, c("year", "month", "day", "hour", "minute", "value")]


outpath  <-  "/home/john/RJunk/elinlun/output"
variable = "ta"
lat  <-  new_inventory[i,][4]
lon  <-  new_inventory[i,][5]
alt  <-  new_inventory[i,][6]
cod  <-  new_inventory[i,][1]
nam  <-  new_inventory[i,][2]
stat = "mean"
units  <-  "C"
sou  <-  "Berkeley_Earth"
note = "monthly"


play(dat1, outpath = outpath, variable = variable, cod = cod, nam = nam,
           lat = lat, lon = lon, alt = alt, sou = sou, stat = stat, units = "C", note = note)
}

Note that the only difference is write_sef has changed to play.

elinlun · February 2, 2021, 7:13pm

jrkrideau:

setwd("/home/john/RJunk/elinlun")


library(tidyverse)
library(dataresqc)

inventory <-  read.csv("Inventory_Berkeley_Earth.csv", sep = "\t", header = FALSE)

colnames(inventory) <- tolower(c("Other_ID", "City", "Modern_Country", "Lat.degN", "Lon.degE",
                         "Station_Elevation.m", "Start_Year", "End_Year" ))
allfiles = list.files("/home/john/RJunk/elinlun/allfiles", full.names=TRUE)

new_inventory  <-  arrange(inventory, city)  

for(i in 1:nrow(new_inventory)){
  print(i)
dat1 <-  read.csv(allfiles[i], sep = "\t", header = FALSE)
names(dat1)  <-  c("year", "month", "value")

nought  <-  data.frame(day = "", hour = "" , minute = "")

dat1  <-  cbind(dat1, nought)

dat1  <-  dat1[, c("year", "month", "day", "hour", "minute", "value")]


outpath  <-  "/home/john/RJunk/elinlun/output"
variable = "ta"
lat  <-  new_inventory[i,][4]
lon  <-  new_inventory[i,][5]
alt  <-  new_inventory[i,][6]
cod  <-  new_inventory[i,][1]
nam  <-  new_inventory[i,][2]
stat = "mean"
units  <-  "C"
sou  <-  "Berkeley_Earth"
note = "monthly"


play(dat1, outpath = outpath, variable = variable, cod = cod, nam = nam,
           lat = lat, lon = lon, alt = alt, sou = sou, stat = stat, units = "C", note = note)
}

I see.
Thank you John!
But for now it did not work.
I have to look more at this.
Thank your for this "hacking" You got skills!
This is the error message.
"Error in play(dat1, outpath = outpath, variable = variable, cod = cod, :
could not find function "play"

For me it is late evening. I have to rest. See you tomorrow!

Elin