I do exactly this with NetCDF files (which, as of version 4, are acutally interoperable with HDF5 files)
library(tidyverse)
library(rhdf5)
# first, here's our extractor function. you can use it anonymously
# inside map_dfr; i'm separating it out here for clarity (and so you
# can reuse it). the extractor needs to accept a filename and
# return a data frame.
hdf5_extractor = function(fname) {
data = h5read(file = fname, name = "data")
# what you do here depends on how objects inside
# the file are structured. if they're just vectors, you can
# create and return a data frame like this:
return(data_frame(
data$SCC_Follow_Info,
data$something_else,
data$another_thing))
# if they aren't vectors, you'll have to think about another way
# to combine them into a data frame...
}
# get the file list and pipe it into our extractor function
df_dim =
list.files(pattern="*.hdf5") %>%
set_names(.) %>%
map_dfr(hdf5_extractor, .id = "file.ID")
If you have large HDF5 files and don't need everything from a particular column, you can also modify this function to filter the contents before you return them