Efficient way to get column names.

I have 2 excel files.
a.) One with column description.
b.) Other one with detailed data having colnames as V1 , V2 ...... V54. in first row.

I want to read one column from first dataframe i.e. column name as character array to use it for loading another dataframe argument which is "colnames".

is there any short/efficient way to accomplish this.

Thanks.

## reading first excel here and my target 
ColnamesAggregateData_Train <- (read_xlsx("data\\train_data\\Updated_Column_Description.xlsx", range = "Aggregate_data!A1:B54"
                                           , col_names = FALSE,
                                           col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
                                           guess_max = min(14)))
ColnamesAggregateData_Train <- ColnamesAggregateData_Train$X__2
AggregateData_Train <- fread("data\\train_data\\AggregateData_Train.csv" , drop =1, col.names = ColnamesAggregateData_Train )
1 Like

Hi, it looks like your code was not formatted correctly to make it easy to read for people trying to help you. Formatting code allows for people to more easily identify where issues may be occurring, and makes it easier to read, in general. I have edited you post to format the code properly.

In the future please put code that is inline (such as a function name, like mutate or filter ) inside of backticks (`mutate`) and chunks of code (including error messages and code copied from the console) can be put between sets of three backticks:

```
example <- foo %>%
  filter(a == 1)
```

This process can be done automatically by highlighting your code, either inline or in a chunk, and clicking the </> button on the toolbar of the reply window!

This will help keep our community tidy and help you get the help you are looking for!

For more information, please take a look at the community's FAQ on formating code

Also, I'm not quite sure what the problem is. Does your code not work? Is it slow?

3 Likes

it looks like your example does what you're looking for. What are you looking for? Something shorter? Is it too slow?

Hey Thanks alot. This was my first post in this community so was not much aware of formatting.

Problem is I am able to achieve with no error but i want to know if there is any alternative way to do this in one shot. Like rather than reading file and then making a character vector. something shorter ?

Yes something in one step. reading directly as column in character array and then using it in read_xl function.

Not really one step, but you could read the single column of row names by changing your range:

range = "Aggregate_data!A1:B54"

to just the range of the row names... Then you aren't pulling in the whole table. Not that it's much to pull in.

I tried that before but it needs more steps to clean it as a character vector.

ColnamesTransactionData_Train <- 
 as.character(read_xlsx("data\\train_data\\Updated_Column_Description.xlsx", range = "Transaction_data!B1:B9"
                                            , col_names = FALSE,
                                            col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
                                            guess_max = min(14)))

Output as follows

print(ColnamesTransactionData_Train )

[1] "c(\"ID\", \"Transaction ID\", \"The date of transaction\", \"Debit/ Credit Indicator\", \"Transaction Type\", \"Disbursal End of month date\", \"Transaction End of Month date\", \"Transaction Mnemonic which helps to identify the type of transaction eg. ATM/NEFT/RTGS etc\", \"Amount of transaction\")"

I need to parse this.

Looks like you have a character vector to me. I'm not sure what you are wanting. Different names?