cleaning imported dataframe

dplyr

#1

I'm rather new to R. I am trying to import a dataframe from Excel (.xls), and then delete the first row, rename the first column, and replace data in the first column. This appears to be a very simple task, but I can't seem to get anything to work in R. This would take about 15 seconds in Excel, but I'm trying to avoid having to do any manipulations in Excel before importing into R. It found it easier to show the actual dataframe rather than a reprex. Using the attached image, I would like to (A) delete the dataframe headings so that the first row (ie, row 1) is the name of each column. (B) I would like to change the column name for column 1 to "Area" rather than NA, and Column 2 to "Units" rather than NA. And lastly (C) I would like to change the data in the renamed Area column to shorter names by removing "Coal consumption in " so that only areas remain such as "World", "North America"...etc. Is there an easy way to do this in R?


#2

I would recommend reading some basic R e.g. this book https://r4ds.had.co.nz/.

Here are your three steps very simply using the tidyverse packages:

library(tidyverse)
library(readxl)

# (A)
my_data <- read_excel("data.xls", skip = 1) %>% 
  # (B)
  rename(Area = X__1, Units = X__2) %>% 
  # (C)
  mutate(Area = gsub("Coal consumption in ", "", Area))

Of course replace data.xls with the path to your data file.


#3

Thanks a lot. Its a learning curve. I'm working my way through R4ds. Trying to eliminate a lot of Excel!


#4

You might want to look at the documentation for readxl on how to import headers too. Currently your headers are considered row 1 within your dataframe.