deleting null rows from specific columns

myusername · April 14, 2019, 9:57am

Hello,

I am trying to clean a csv dataset. My goal is to delete rows with null values if both columns have nulls.

For example:

I tried using complete.cases but I could not apply it with the mentioned condition.

I hope someone here can help me.
Thank you,

myusername · April 14, 2019, 9:51pm

I want to delete nulls only if both columns have nulls in the same row.

Yarnabrina · April 15, 2019, 3:15am

The function complete.cases can be used when you wish to remove a row with at least one null value in it. For the dataset that you've provided, it should work, as either both columns are null, or none of them is.

But if you wish to remove only those rows with all null in a general case, you can do it like the following:

# provided dataset
df <- data.frame(x = c(2, 4, 2, NA, -6, -4, -5, NA, -3, -2, -2, -2, NA, -2),
                 y = c(11, 20, 33, NA, 25, -28.8, -39.8, NA, -61.8, -72.8, -83.8, -94.8, NA, -116.8))

# removing only those rows with all NA
df[!(rowSums(x = is.na(x = df)) == ncol(x = df)),]
#>     x      y
#> 1   2   11.0
#> 2   4   20.0
#> 3   2   33.0
#> 5  -6   25.0
#> 6  -4  -28.8
#> 7  -5  -39.8
#> 9  -3  -61.8
#> 10 -2  -72.8
#> 11 -2  -83.8
#> 12 -2  -94.8
#> 14 -2 -116.8

# does the same, for this particular example
df[complete.cases(df),]
#>     x      y
#> 1   2   11.0
#> 2   4   20.0
#> 3   2   33.0
#> 5  -6   25.0
#> 6  -4  -28.8
#> 7  -5  -39.8
#> 9  -3  -61.8
#> 10 -2  -72.8
#> 11 -2  -83.8
#> 12 -2  -94.8
#> 14 -2 -116.8

^{Created on 2019-04-14 by the reprex package (v0.2.1)}

Consider the solution step by step:

df[!(rowSums(x = is.na(x = df)) == ncol(x = df)),]

First, it checks whether each element of the data.frame is NA or not in is.na(x = df) part. It returns a data.frame of the same dimension as that of df, with its elements being TRUE or FALSE according as they are null or not.
Next, rowSums(x = is.na(x = df)) finds the number of TRUE's in each row of the above data.frame, i.e. the number of NA elements in df.
Now, we need to check whether all elements of a row are null or not. If so, then a row must have that many NA as many columns of the df. The following part checks for that: rowSums(x = is.na(x = df)) == ncol(x = df).
The above yields TRUE or FALSE for all rows of df with TRUE corresponding to those rows which have all NA elements. You want all the other rows. So, we reverse the logical indices by negating them, and bingo! You get what you want.

I hope I've explained my code clearly. If you still have doubts, run it for your own dataset step by step and you'll get it.

Hope this helps.

Note Please ask your future questions with a REPRoducible EXample of your problem? It provides more specifics of your problem, and it helps others to understand what problem you are facing.

system · April 22, 2019, 3:15am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.