Performance issue in reading binary format of excel file.(.xlsb)


#1

Hello Guys,

The below code is running from last 30 mins and still giving me neither results nor error. it seems the function reading the file is lost somewhere inside my file.
Is excel.link the only package to read binary formatted files in R ?
any alternatives with data.table approaches ?

Regards,
Vishal

> ```
> library(excel.link)
> 
> start.time = Sys.time()
> Cust_Transactions <- xl.read.file("C:/Users/vishals758/Desktop/Workspace/Apr-Sep.xlsb", header = TRUE, row.names = NULL, col.names = NULL,
>                                   xl.sheet = "sample", top.left.cell = "A1", na = "", password = NULL,
>                                   write.res.password = NULL, excel.visible = FALSE)
> 
> 
> end.time <- Sys.time()
> time.taken <- end.time - start.time
> time.taken
> ```

#2

My invariable practice with spreadsheet files is to export to csv and

 raw_data <- as.tibble(read.csv("name of file", stringsAsFactors = FALSE")

#3

hi @technocrat. It seems the files are very compact in nature in binary format. I want to find a better way to read binary format file.

I have tried converting it into csv format but a merely 3204Kb (Binary file ) is equivalent to 104 Mb (.csv) file.


#4

I have never heard of excel.link package, but you can also try readxl - https://github.com/tidyverse/readxl.

If it is as slow, then at least you know that there is something with how this particular file is created/saved.


#5

@mishabalyasin.. Thanks for the mention but it does not support reading excel files-binary format but only xml based format..

readxl supports both the legacy .xls format and the modern xml-based .xlsx format. The libxls C library is used to support .xls, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the RapidXML C++ library

Regards.


#6

I was hoping that maybe, based on the docs you quoted, the .xls and .xlsb formats were in fact the same. But this SO anaswer suggests that isn't the case:

.xlsb was introduced in excel 2007 alongside .xlsx and .xlsm . All three formats use the OPC standard and are conceptually similar (whereas .xls , while also a binary format, is much different -- for example, it uses an OLE container format rather than zip)

.xlsb is not compatible with .xls , and AFAICT there are no open source tools that can write XLSB.

I'm definitely keen to see if you find something capable of handling it!