Couple of general questions regarding function

Hi everyone,

I have the following code but need to fix a couple of things.

First of all the data$lat_lowetmode and lon_lowestmode are latitude and longitude. When I run the function, the latitude and longitudes are cut short (go from 31.11111111 to 31.1111). I would like to preserve the precision of the lat and long values.

Second, the shot_number is turned to NA. I've tried using data$sht_number[bit64conversion = 'bit64'] but it didn't do anything

Lastly, data$rh causes the function to stop working. I think it has to do with the fact that rh is not just one vector like the other variables, but is in fact a bunch of variables labeled rh0 rh1 rh2.... rh100 representing height variables (I'm working with lidar data). They're in a table of their own so I need to figure out some bit of code to add to data$rh to include rh in the data frame.

path_to_h5 = "D:/GEDI_Thesis_Data/L2A"
f = dir(pattern = "*.h5", path_to_h5)

hdf5_extractor = function(fname){
  data = h5read(file = fname, name = "BEAM0000")
  return(data_frame(
    data$lat_lowestmode,
    data$lon_lowestmode,
    data$shot_number,
    data$degrade_flag,
    data$quality_flag,
    data$beam,
    data$rh
  ))
}

df_dim = 
  f %>%
  set_names(.) %>%
  map_dfr(hdf5_extractor, .id = "file.ID")

Can you post an example of the data object returned by h5read()? You can post the output of dput(data) here and others can use that to make a copy of your data.

Which data_frame() function are you using in hdf5_extractor()? The one in the tibble package is deprecated. Try using tibble() instead.

You could try wrapping data$rh in list() to make a list column in the tibble.

Are you sure the latitude and longitude are being truncated? The print function applied to a tibble shows rounded values but the full precision is still there. For example,

DF <- tibble(Num = c(3.111, 3.1111, 3.11111, 3.111111))
> DF
# A tibble: 4 × 1
    Num
  <dbl>
1  3.11
2  3.11
3  3.11
4  3.11
> DF$Num - 3
[1] 0.111000 0.111100 0.111110 0.111111
1 Like

Hi FJCC, thank you again for your help here.

As far as which data_frame() function I am using, I am not entirely sure but I am using the following packages

install.packages("BiocManager")
BiocManager::install("rhdf5")
library("rhdf5")
library("maps")
library("bit64")
library("tidyverse")

So I've tried using dput() on the output to share, I'll paste that below (I hope it helps a bit, I don't really know what I'm doing)

As far as losing precision on lat and long, you are correct!! Looks like it preserved the precision.

For the rh metrics (not included in the dput below) I tried wrapping data$rh in list() and it didn't work. I also tried wrapping it in data.frame() but it caused rstudio to abort the process. The rh metrics are columns of their own (see screenshot


)

I've gone ahead and attached one hdf5 file through a link to my google drive, maybe this will help? The data are also freely available from NASA if anyone is wondering.

https://drive.google.com/file/d/1Ltr3bsDsqGtYhZldhSdWNBCa5EIt7-_g/view?usp=sharing

bit_of_data = dput(df_dim[1:5,])
structure(list(file.ID = c("processed_GEDI02_A_2019121080625_O02166_02_T05019_02_003_01_V002.h5", 
"processed_GEDI02_A_2019121080625_O02166_02_T05019_02_003_01_V002.h5", 
"processed_GEDI02_A_2019121080625_O02166_02_T05019_02_003_01_V002.h5", 
"processed_GEDI02_A_2019121080625_O02166_02_T05019_02_003_01_V002.h5", 
"processed_GEDI02_A_2019121080625_O02166_02_T05019_02_003_01_V002.h5"
), `data$lat_lowestmode` = structure(c(31.0297468699227, 31.0301184379744, 
31.0304915669087, 31.0308620214345, 31.031233550866), .Dim = 5L), 
    `data$lon_lowestmode` = structure(c(-86.8076009329565, -86.8071849199289, 
    -86.8067683489037, -86.8063528681196, -86.805937003045), .Dim = 5L), 
    `data$shot_number[bit64conversion = "bit64"]` = c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), `data$degrade_flag` = structure(as.raw(c(0x00, 
    0x00, 0x00, 0x00, 0x00)), .Dim = 5L), `data$quality_flag` = structure(as.raw(c(0x01, 
    0x01, 0x01, 0x00, 0x01)), .Dim = 5L), `data$beam` = structure(c(0L, 
    0L, 0L, 0L, 0L), .Dim = 5L)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

Is the dput() output you posted the result of code like h5read(file = fname, name = "BEAM0000") or is it the result of running hdf5_extractor()? It would be best to get the object produced by h5read, though I understand that may be far to big to post. I am not at all familiar with the rhdf5 package so I'm struggling to be helpful.
What code exactly did you run when you tried to wrap data$rh in list()?

Hi FJCC,

I ran dput for the output of hdf5_extractor. Running dput for the hdf5 is problematic because the hdf5 file has many subfolders within each beam folder

I wrapped the rh metrics in list() as follows

hdf5_extractor = function(fname){
  data = h5read(file = fname, name = "BEAM0000")
  return(data_frame(
    data$lat_lowestmode,
    data$lon_lowestmode,
    data$shot_number ,
    data$degrade_flag,
    data$quality_flag,
    data$beam,
    list(data$rh)
  ))
}

~~~

I've reached out to the data help center at my university a couple of days ago too but they haven't gotten back. When they do, I'll share the results if the solution doesn't come sooner.

Cheers

Does data$rh have the same number of rows as the other parts of the data, such as data$lat_lowestmode? If so, it should be possible to append it column-wise to the other data.

1 Like

Hi FJCC, data$rh does have the same number of rows so appending the rh table to the rest of the data is definitely a possibility. I'll look into how to add that command to the function.
Thanks as always for your help

Check out the cbind() function in base R or the bind_cols() function from dplyr.

1 Like

Okay so I think I've possibly figured out the issue with the rh metrics

when I read the rh metrics in on their own using the following code.......

#read in rh metrics (file_one is just one single hdf5 gedi file, the BEAM is one of the beams in the hdf5 file, and rh is the height metrics table)

rh = h5read(file_one, "BEAM0000/rh")

#I notice that it reads in the table as a matrix with rows as columns and columns as rows...
#so I use the following code to change columns to rows and vice versa

rh = as.data.frame(t(rh))

#then rename the columns to match the height metric they represent (eg rh0, rh1, rh2...rh100)
colnames(rh) = paste0('rh', 0:100)

So I think the issue is that when I read in data$rh, it reads it in with the rows as columns, and since all of the rows are unique to each dataset (different number of observations per data set) it causes the code to fail.

In other words, I need to figure out how to use as.data.frame(t(rh)) in the function used in the beginning of this thread. Hopefully this won't be too difficult to do.

Hi everyone, I wanted to share an update- I think I am making some amount of progress.
Using the same code above, and using tips and tricks from FJCC, I've managed to get the data$rh array to coerce to a dataframe and transpose it so that the columns and rows match the other data of interest (see code below)

Now the issue is that the data$rh has headers on the columns while the other data does not, so when importing the data I now have the error "can't combine 'processed_gedi.....(data$rh) to 'processed_gedi....' incompatible sizes

Basically I just need to figure out how to give the other data column names before combining them into a dataframe

install.packages("BiocManager")
BiocManager::install("rhdf5")
library("rhdf5")
library("maps")
library("bit64")
library("tidyverse")

path_to_h5 = "D:/GEDI_Thesis_Data/L2A"
f = dir(pattern = "*.h5", path_to_h5)

hdf5_extractor = function(fname){
  data = h5read(file = fname, name = "BEAM0000")
  return(data_frame(
    data$lat_lowestmode,
    data$lon_lowestmode,
    data$shot_number,
    data$degrade_flag,
    data$quality_flag,
    data$beam,
    t(as.data.frame(data$rh))
  ))
}

df_dim = 
  f %>%
  set_names(.) %>%
  map_dfr(hdf5_extractor, .id = "file.ID")

h5closeAll()

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.