Delete empty rows

fredm · September 29, 2021, 2:41pm

Dear Community,

I am currently working on a dataset that shows the development of mortality caused by road traffic accidents across a number of countries over the last 50 years. For my research, I would like to delete all columns (years) as well as countries (rows) which show no data at all. For the columns, I was able to make use of a solution from the forum in which I simply calculate the total sum of values for each column and then delete all columns which show a sum of 0:

#Count the empty values in a column

colSums(is.na(PS22) | PS22 == "")

#create a boolean variable that indicates if a column is empty (True) or not (False)

empty_columns <- colSums(is.na(PS22) | PS22 == "") == nrow(PS22)

#Remove empty columns
PS222 <- PS22[, !empty_columns]

For the rows (countries) however, this obviously does not work as a row not only includes the respective numbers for each year but also (non-integer) values for other variables such as e.g. the country code. I was trying to make use of the same logic as in the case of columns, however, only taking the columns into account which actually represent years. I did not manage to find an acceptable solution which is why I would like to ask if maybe some of you guys could give me a hand here

Many thanks in advance!

TylerRen · October 3, 2021, 10:19pm

Using the is.na() function is probably your best bet. I've seen the slice() method used to pick the specific rows you want or the slice(-()) to exclude whichever you choose.

startz · October 3, 2021, 10:40pm

Look at remove_empty() in the janitor package.

fredm · October 4, 2021, 8:34am

Cheers guys, I've managed to solve the problem as proposed

system · October 11, 2021, 8:35am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.