Issue
I have a data set I am trying to use to create an H5 file I can pass to a Python package. I am populating the attributes using the function attr(object=,attr_name=) <- value
. However, when I try to load my attributes for each object in the group within my h5 file, it appears the data class is not being preserved. When I open my h5 file in Python with {h5py} and look at the attributes every object is defined as fallows dtype=object
. Does anyone know if this is a default of the attr()
function? If so, should I try to use the create_attr()
instead?
Thanks for any and all help! I recommend running this in Rmarkdown so you can make one r chunk and one python chunk for each of my blocks of code.
Reprex - edited
I am providing a simplified version of the code with sample data for three objects/events within the first group.
These events house a 3x6000 matrix each.
Each matrix should have 3 attributes each - a numeric, a char, and a list
Edits
-
The reticulate package will be used at the end of the r chunk to pass the path to the h5 file you created in R to the Python chunk.
-
Removed the list format from the purrr functions, functions working cleanly now.
R Code for Creating the File
library(hdf5r)
library(dplyr)
library(reticulate)
h5_file_path = here::here() # path to where you are creating the h5 file
# This line creates the empty file
NMTSO_trainer.h5 <- H5File$new(filename = sprintf("%s/NMTSO_trainer.h5", h5_file_path), mode = "a")
# This creates a group within the file, think of the file as a directory tree and each group is like folder
data.grp <- NMTSO_trainer.h5$create_group("data")
# Items to populate attribues
trace_name = c("sample_event1", "sample_event2", "sample_event3")
col_names = c("att1", "att2", "att3")
value = list(runif(n = 1, -100, 100), "SC", list(c(0,0,runif(n = 1, 0, 5))))
# Place holder for the matrices per event
x = list()
events = length(trace_name)
# Populates the event matrices
for (i in 1:events){
x[[i]] <- runif(n = 6000, -1, 1) %>% matrix(nrow = 1)
x[[i]] <- rep(0,(2 * 6000)) %>% matrix(nrow = 2) %>% rbind(x[[i]])
}
# Puts each matrix within the corresponding "folder" in the h5 file
purrr::map2(trace_name, x, function(trace_name, x){
data.grp[[trace_name]] <- x
})
# Puts the corresponding attributes with each matrix - there should be 3 per matrix.
# This is where I am wondering if I should use create_attr() rather than h5attr()
purrr::walk(trace_name, function(trace_name){
purrr::walk2(col_names, value, function(col_names, value){
h5attr(data.grp[[trace_name]], col_names) <- value
})
})
# Shows the class of the objects pupulated in the h5 file according to R
h5attr(data.grp[[trace_name[1]]], col_names[1]) %>% class()
h5attr(data.grp[[trace_name[1]]], col_names[2]) %>% class()
h5attr(data.grp[[trace_name[1]]], col_names[3]) %>% class()
# The file must be closed for all data to be written to the file
NMTSO_trainer.h5$close_all(close_self = TRUE)
py$file_path = sprintf("%s/NMTSO_trainer.h5", h5_file_path)
Python Chunk for Evaluating Attr Format
Edits
- Passed the file_path object created in R into the Python environment using the reticulate package. No longer requires any user file_path manipulation as long as the code is ran in Rstudio's Rmarkdown files to take advantage of R's Python engine.
import h5py
import pandas as pd
import numpy as np
e = h5py.File(file_path, 'r')
# Shows the users what groups are in the file
list(e.keys())
group = e['data']
# Shows the user what events are in the group
list(group.keys())
# Shows the user what is in the attributes
group['sample_event1'].attrs['att1']
group['sample_event1'].attrs['att2']
group['sample_event1'].attrs['att3']
# Shows the user what format the data is in
type(group['sample_event1'].attrs['att1'])
type(group['sample_event1'].attrs['att2'])
type(group['sample_event1'].attrs['att3'])