large dateset visualization

I have a dataframe data in R of dim 102500 rows by 41 columns.

Each 41 lines is a frame measured at different time intervals (ie 2500 frames). I need to do heatmap in each data frame, and at the end visualize it as a 3D to show each layer separably by going through the layers.

I tried this matrix to split my data:

m1 <- matrix(1:(41250041), nrow=41*2500, ncol=41)
lst <- lapply(split(seq_len(nrow(m1)),(seq_len(nrow(m1))-1) %/%41 +1),
function(i) m1[i,])
arr1 <- array(0, dim=c(41,41,2500))
for(i in 1:2500){
arr1[,,i] <- lst[[i]]
}
dfs <- split(data,arr1)

for (p in dfs) { print(dfs$p) }
for (p in dfs){
NeatMap::heatmap1(as.matrix(p))

but it is not working properly.

Do you have any suggestion for me to define my x, y, and z axis to split my data frame and visualize it as a 3D at the end?

This code is working properly for one data frame:

NeatMap::heatmap1(data[1:41,])
02

But it will takes forever if I want to manually split the data, and create heat map for each of them!

Hi,

Welcome to the RStudio community!

Could you please explain what the end result is supposed to look like. I don't understand the 3D part as you said you want to create 41 heatmaps and use the heatmap1 function although there is a profileplot3d function in that package as well, which does look confusion I must say.

I also suggest you read how to create a reprex, as your code now is very confusing. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

I don't think this is what you want, but here is some code I wrote based on your question:

library(NeatMap)

#Generate a dummy matrix 
m1 = matrix(1:(41*2500*41), nrow=41*2500, ncol=41)

#Split into 41 matrices of each 2500 rows
m1 = lapply(1:41, function(x){
  m1[(1+2500*(x-1)):(2500*x),]
})

#Generate a heatmap for every of the 41 matrices
heatmaps = lapply(m1, heatmap1)

#Render a heatmap
heatmaps[[1]]

Please elaborate and update the code, and we can work from there.

Hope this helps,
PJ

Hello,

Thank you for your response. I attached a picture as an example of what my data is supposed to look like at the end. In this picture, you're just able to see 1/2500 part of the entire data. It mean it includes the data from 41 columns and 41 rows (out of 102500 rows). But as you go through the movie, you can see the different layers of the dateset (2500 different layers).

I used your code (with a little bit of change to split the data into 2500 matrices of each 41 rows), and I tried to use the matrix to use it in my data (which I called it snap) as follow:

library(NeatMap)

m1 = matrix(1:(41* 2500 * 41), nrow=41 * 2500, ncol=41)

m1 = lapply(1:2500, function(x){
m1[(1+41*(x-1)):(41*x),]
})

df <- split(snap,m1)

heatmaps = lapply(df, heatmap1)

heatmaps[[1]]

But when I'm running the chunk, when I get to df <- split(snap,m1) part I get the following error:
Error: cannot allocate vector of size 35.4 Gb

I also tried to use this code in smaller part of my dateset, and I got the same error again!

Thank you again for your help,
Sally

library(shiny)
library(plotly)
library(shinycssloaders)

ui <- fluidPage(
  fluidRow(
    withSpinner(plotlyOutput("myplot"))
  ),
  fluidRow(
    sliderInput("myslider", "pick frame",
      min = 1, max = 2500, value = 1,
      width = "100%"
    )
  ),
  fluidRow(align="center",
    numericInput("mynumin","pick frame also",
                 min=1,max=2500,value=1)
  )
)

server <- function(input, output, session) {
  m1 <- matrix(0, nrow = 41 * 2500, ncol = 41)
  a1 <- array(m1, c(41, 41, 2500))

  #just made some function to populate the matrix with 'interesting' data
  matvals_k <- function(k) {
    outer(1:41, 1:41, function(i, j) {
      abs(i - 20)^log10(abs(k - 1250.5)) + log(j)
    })
  }

  for (k in 1:2500) {
    a1[, , k] <- matvals_k(k)
  }
 # first frame is a1[,,1] , second frame is a1[,,2]

  num_frame <- reactiveVal(1)
  
  observeEvent(num_frame(),{
               updateSliderInput(session=session,
                                 inputId="myslider",
                                 value = num_frame())
    updateNumericInput(session=session,
                      inputId="mynumin",
                      value = num_frame())           
    })
  observeEvent(input$myslider,
               num_frame(input$myslider)
  )
  observeEvent(input$mynumin,
               num_frame(input$mynumin)
  )
  
  frame_of_interest <- reactive({
    a1[, ,num_frame()]
  })
  output$myplot <- renderPlotly({
    plot_ly(z = ~ frame_of_interest()) %>%
      add_surface()
  })
}

shinyApp(ui, server)
2 Likes

Thank for your input. I think I'll be able to use it for my data visualization. I tried to replace my data (snap) with the part that you populated your data as follow:

matvals_k <- function(k) {
outer(1:41, 1:41, function(i, j) {
split(snap,a1)
})
}

But I get this error:

data length is not a multiple of split variable
Warning: Error in <-: dims [product 1681] do not match the length of object [1]

  • 52: outer*
  • 51: matvals_k [#7]*
  • 50: server [#13]*
    *Error in dim(robj) <- c(dX, dY) : *
  • dims [product 1681] do not match the length of object [1]*

Do you have any suggestion for me that what is the best way to replace my data in this matrix?

Thank you,
Sally

you don't need your own version of matvals_k, that was just so I could populate a1 with something more interesting than all zero.
if your data is snap you should use

  a1 <- array(snap, c(41, 41, 2500))

and delete the rest until

 # first frame is a1[,,1] , second frame is a1[,,2]
  num_frame <- reactiveVal(1)

Thank you very much again. I fixed that part, but now I get the following error:

z must be a numeric matrix

I'm assuming I should add as.matrix somewhere, but I can't figure which part we're introducing the z in here?

what is snap ?
I can't even make a bad snap that throws that error from trying to... :smiley:

class(snap)
str(snap)

oh, z is in the plotly call.
at that point its not a numeric ? what is it ?

output$myplot <- renderPlotly({
     str( frame_of_interest())
    plot_ly(z = ~ frame_of_interest()) %>%
      add_surface()
  })

class(snap)
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"

str(snap)
tibble [102,500 x 41] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
1 : num [1:102500] 0.0563 0.0563 0.0564 0.0564 0.0564 ... 2 : num [1:102500] 0.0563 0.0564 0.0564 0.0564 0.0565 ...
3 : num [1:102500] 0.0564 0.0564 0.0564 0.0565 0.0565 ... 4 : num [1:102500] 0.0564 0.0564 0.0565 0.0565 0.0565 ...
5 : num [1:102500] 0.0564 0.0565 0.0565 0.0565 0.0565 ... 6 : num [1:102500] 0.0564 0.0565 0.0565 0.0565 0.0566 ...
7 : num [1:102500] 0.0565 0.0565 0.0565 0.0565 0.0566 ... 8 : num [1:102500] 0.0565 0.0565 0.0565 0.0566 0.0566 ...
9 : num [1:102500] 0.0565 0.0565 0.0566 0.0566 0.0566 ... 10: num [1:102500] 0.0565 0.0565 0.0566 0.0566 0.0566 ...
11: num [1:102500] 0.0565 0.0566 0.0566 0.0566 0.0566 ... 12: num [1:102500] 0.0565 0.0566 0.0566 0.0566 0.0567 ...
13: num [1:102500] 0.0566 0.0566 0.0566 0.0566 0.0567 ... 14: num [1:102500] 0.0566 0.0566 0.0566 0.0566 0.0567 ...
15: num [1:102500] 0.0566 0.0566 0.0566 0.0567 0.0567 ... 16: num [1:102500] 0.0566 0.0566 0.0566 0.0567 0.0567 ...
17: num [1:102500] 0.0566 0.0566 0.0566 0.0567 0.0567 ... 18: num [1:102500] 0.0566 0.0566 0.0567 0.0567 0.0567 ...
19: num [1:102500] 0.0566 0.0566 0.0567 0.0567 0.0567 ... 20: num [1:102500] 0.0566 0.0566 0.0567 0.0567 0.0567 ...
21: num [1:102500] 0.0566 0.0566 0.0567 0.0567 0.0567 ... 22: num [1:102500] 0.0566 0.0566 0.0567 0.0567 0.0567 ...
23: num [1:102500] 0.0566 0.0566 0.0567 0.0567 0.0567 ... 24: num [1:102500] 0.0566 0.0566 0.0567 0.0567 0.0567 ...
25: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 26: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
27: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 28: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
29: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 30: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
31: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 32: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
33: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 34: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
35: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 36: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
37: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 38: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
39: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ... 40: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...
$ 41: num [1:102500] 0.0566 0.0567 0.0567 0.0567 0.0567 ...

  • attr(*, "spec")=
    .. cols(
    .. 1 = col_double(),
    .. 2 = col_double(),
    .. 3 = col_double(),
    .. 4 = col_double(),
    .. 5 = col_double(),
    .. 6 = col_double(),
    .. 7 = col_double(),
    .. 8 = col_double(),
    .. 9 = col_double(),
    .. 10 = col_double(),
    .. 11 = col_double(),
    .. 12 = col_double(),
    .. 13 = col_double(),
    .. 14 = col_double(),
    .. 15 = col_double(),
    .. 16 = col_double(),
    .. 17 = col_double(),
    .. 18 = col_double(),
    .. 19 = col_double(),
    .. 20 = col_double(),
    .. 21 = col_double(),
    .. 22 = col_double(),
    .. 23 = col_double(),
    .. 24 = col_double(),
    .. 25 = col_double(),
    .. 26 = col_double(),
    .. 27 = col_double(),
    .. 28 = col_double(),
    .. 29 = col_double(),
    .. 30 = col_double(),
    .. 31 = col_double(),
    .. 32 = col_double(),
    .. 33 = col_double(),
    .. 34 = col_double(),
    .. 35 = col_double(),
    .. 36 = col_double(),
    .. 37 = col_double(),
    .. 38 = col_double(),
    .. 39 = col_double(),
    .. 40 = col_double(),
    .. 41 = col_double()
    .. )

I even did that:

output$myplot <- renderPlotly({
plot_ly(z = ~ as.matrix(frame_of_interest())) %>%
add_surface()
})
}

but I got the same error!

no. you need snap to be a matrix before you translate it with array to become a1

a1 <- array(as.matrix(snap), c(41, 41, 2500))

It worked :slight_smile: Thank you very much. it's how it look like now:Capture

It is not quite how I expected it to look like, but I think I can work around it.

I appreciate your help very much.

well, one thing to watch out for is that the way snap is populated, is cut to the dimensions correctly
41x41x2500 assumes a particular layout that may be incorrect. it might be 2500,41,41 or 41,2500,41
Theres not principled way for me over here to know that...

I'll try both to see how do they look. But basically snap is a csv format, that has 102500 rows and 41 columns. I've been trying to split the 102500 rows to 2500 data frames (2500 X 41 rows).

I'll play around with the codes to see if I can get my desire outcome. Thank you very much again.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.