Delete columns whitout "NULL"

Hi guys,
I have this DataFrame


mergiato <- data.frame(subject="" , Gender="",Weight="", Height="",Age="",test="",act="",VelInc_X_mean="",VelInc_Y_mean="",VelInc_Z_mean="", VelInc_X_SD="",VelInc_Y_SD="",VelInc_Z_SD="",VelInc_MEAN="",VelInc_SD="", OriInc_w_mean="",OriInc_w_SD="",OriInc_x_mean="",OriInc_x_SD="",OriInc_y_mean="",OriInc_y_SD="",OriInc_z_mean="",OriInc_z_SD="",OriInc_MEAN="",OriInc_SD="",Acc_X_SD="",Acc_Y_mean="",Acc_Y_SD="",Acc_Z_mean="",Acc_Z_SD="",Acc_MEAN="",Acc_SD="",Gyr_X_mean="",Gyr_X_SD="", Gyr_Y_mean="",Gyr_Y_SD="",Gyr_Z_mean="",Gyr_Z_SD="",Gyr_MEAN="",Gyr_SD="",Mag_X_mean="",Mag_X_SD="",Mag_Y_mean="",Mag_Y_SD="",Mag_Z_mean="",Mag_Z_SD="",Mag_MEAN="",Mag_SD="",Roll_mean="",Pitch_mean="",Yaw_mean="",RSSI_mean="")

i want to remove some columns. I did it in this way :

mergiato$VelInc_X_mean <- NULL
mergiato$VelInc_Y_mean <- NULL
mergiato$VelInc_Z_mean <- NULL
mergiato$VelInc_X_SD   <- NULL
mergiato$VelInc_Y_SD   <- NULL
mergiato$VelInc_Z_SD   <- NULL
mergiato$OriInc_w_mean <- NULL
mergiato$OriInc_x_mean <- NULL
mergiato$OriInc_y_mean <- NULL
mergiato$OriInc_z_mean <- NULL
mergiato$OriInc_w_SD   <- NULL
mergiato$OriInc_x_SD   <- NULL
mergiato$OriInc_y_SD   <- NULL
mergiato$OriInc_z_SD   <- NULL
mergiato$Acc_X_mean    <- NULL
mergiato$Acc_Y_mean    <- NULL
mergiato$Acc_Z_mean    <- NULL
mergiato$Acc_X_SD      <- NULL
mergiato$Acc_Y_SD      <- NULL
mergiato$Acc_Z_SD      <- NULL
mergiato$Mag_X_mean    <- NULL
mergiato$Mag_Y_mean    <- NULL
mergiato$Mag_Z_mean    <- NULL
mergiato$Mag_X_SD      <- NULL
mergiato$Mag_Y_SD      <- NULL
mergiato$Mag_Z_SD      <- NULL
mergiato$Gyr_X_mean    <- NULL
mergiato$Gyr_Y_mean    <- NULL
mergiato$Gyr_Z_mean    <- NULL
mergiato$Gyr_X_SD      <- NULL
mergiato$Gyr_Y_SD      <- NULL
mergiato$Gyr_Z_SD      <- NULL 

it works, and I get what I wanted, but I would like to write it in a more "beautiful" way. and even more efficient, because if for example I had a dataframe with 1000 columns to delete, I would have to write 1000 lines to delete them

I add something that might be useful.

I think you could make a command like:

If in the column name, there is the character: "x" or "y" or "z", (both uppercase and lowercase) deletes the column.

I hope I have written a correct reprex. If I'm wrong, I'll correct it right away

There are lots of ways to subset data in R (in this case, selecting certain variables to eliminate,), and the right one for you depends on your criteria.

Some options from the link below

# exclude variables v1, v2, v3
myvars <- names(mydata) %in% c("v1", "v2", "v3") 
newdata <- mydata[!myvars]

# exclude 3rd and 5th variable 
newdata <- mydata[c(-3,-5)]

# delete variables v3 and v5
mydata$v3 <- mydata$v5 <- NULL

You can also use conditional logic, both with base R, or using other packages, such as dplyr:

2 Likes

but in this way I still have to write which columns I want and which are not. does not exploit the fact that the columns I want to delete contain the characters "_X or" y "etc--

Right, that's why I pointed out the dplyr select helpers. Or you can use grepl() to select (columns not containing whatever string):

1 Like

This is an example of Mara's advice, you'll need to custom the regex to your needs.

mergiato <- data.frame(subject="" , Gender="",Weight="", Height="",Age="",test="",act="",VelInc_X_mean="",VelInc_Y_mean="",VelInc_Z_mean="", VelInc_X_SD="",VelInc_Y_SD="",VelInc_Z_SD="",VelInc_MEAN="",VelInc_SD="", OriInc_w_mean="",OriInc_w_SD="",OriInc_x_mean="",OriInc_x_SD="",OriInc_y_mean="",OriInc_y_SD="",OriInc_z_mean="",OriInc_z_SD="",OriInc_MEAN="",OriInc_SD="",Acc_X_SD="",Acc_Y_mean="",Acc_Y_SD="",Acc_Z_mean="",Acc_Z_SD="",Acc_MEAN="",Acc_SD="",Gyr_X_mean="",Gyr_X_SD="", Gyr_Y_mean="",Gyr_Y_SD="",Gyr_Z_mean="",Gyr_Z_SD="",Gyr_MEAN="",Gyr_SD="",Mag_X_mean="",Mag_X_SD="",Mag_Y_mean="",Mag_Y_SD="",Mag_Z_mean="",Mag_Z_SD="",Mag_MEAN="",Mag_SD="",Roll_mean="",Pitch_mean="",Yaw_mean="",RSSI_mean="")
library(dplyr)
mergiato %>% 
    select(matches("^[^XYZ]*$"))
#>   subject Gender Weight Height Age test act VelInc_MEAN VelInc_SD
#> 1                                                                
#>   OriInc_w_mean OriInc_w_SD OriInc_MEAN OriInc_SD Acc_MEAN Acc_SD Mag_MEAN
#> 1                                                                         
#>   Mag_SD Roll_mean Pitch_mean RSSI_mean
#> 1

Created on 2019-02-19 by the reprex package (v0.2.1)

1 Like

thanks for the example. only one problem. If I write [XYZ] does not consider all the words in which there are these letters? the character to check is "X" (underscore X / Y / Z underscore) I can write it directly or it comes in contrast with some "symbol" that R uses to specify various actions

sorry my mistake .. I just realized that "_" is not visible when I send the question as it turns the word into italic ..
so I'll give it back. the character is
underscoreX / Y / Zunderscore

The regular expression that I can come up with for that would be something like this example, but unfortunatly tidyselect doesn't support look-ahead on its regex dialect and I don't know a walk around for this.

mergiato <- data.frame(subject="" , Gender="",Weight="", Height="",Age="",test="",act="",VelInc_X_mean="",VelInc_Y_mean="",VelInc_Z_mean="", VelInc_X_SD="",VelInc_Y_SD="",VelInc_Z_SD="",VelInc_MEAN="",VelInc_SD="", OriInc_w_mean="",OriInc_w_SD="",OriInc_x_mean="",OriInc_x_SD="",OriInc_y_mean="",OriInc_y_SD="",OriInc_z_mean="",OriInc_z_SD="",OriInc_MEAN="",OriInc_SD="",Acc_X_SD="",Acc_Y_mean="",Acc_Y_SD="",Acc_Z_mean="",Acc_Z_SD="",Acc_MEAN="",Acc_SD="",Gyr_X_mean="",Gyr_X_SD="", Gyr_Y_mean="",Gyr_Y_SD="",Gyr_Z_mean="",Gyr_Z_SD="",Gyr_MEAN="",Gyr_SD="",Mag_X_mean="",Mag_X_SD="",Mag_Y_mean="",Mag_Y_SD="",Mag_Z_mean="",Mag_Z_SD="",Mag_MEAN="",Mag_SD="",Roll_mean="",Pitch_mean="",Yaw_mean="",RSSI_mean="")
grep("^(?!.*_X?x?_)(?!.*_Y?y?_)(?!.*_Z?z?_)", names(mergiato), perl = TRUE, value = TRUE)
#>  [1] "subject"       "Gender"        "Weight"        "Height"       
#>  [5] "Age"           "test"          "act"           "VelInc_MEAN"  
#>  [9] "VelInc_SD"     "OriInc_w_mean" "OriInc_w_SD"   "OriInc_MEAN"  
#> [13] "OriInc_SD"     "Acc_MEAN"      "Acc_SD"        "Gyr_MEAN"     
#> [17] "Gyr_SD"        "Mag_MEAN"      "Mag_SD"        "Roll_mean"    
#> [21] "Pitch_mean"    "Yaw_mean"      "RSSI_mean"

EDIT: It turns out that I can avoid using tidyselect at all

mergiato <- data.frame(subject="" , Gender="",Weight="", Height="",Age="",test="",act="",VelInc_X_mean="",VelInc_Y_mean="",VelInc_Z_mean="", VelInc_X_SD="",VelInc_Y_SD="",VelInc_Z_SD="",VelInc_MEAN="",VelInc_SD="", OriInc_w_mean="",OriInc_w_SD="",OriInc_x_mean="",OriInc_x_SD="",OriInc_y_mean="",OriInc_y_SD="",OriInc_z_mean="",OriInc_z_SD="",OriInc_MEAN="",OriInc_SD="",Acc_X_SD="",Acc_Y_mean="",Acc_Y_SD="",Acc_Z_mean="",Acc_Z_SD="",Acc_MEAN="",Acc_SD="",Gyr_X_mean="",Gyr_X_SD="", Gyr_Y_mean="",Gyr_Y_SD="",Gyr_Z_mean="",Gyr_Z_SD="",Gyr_MEAN="",Gyr_SD="",Mag_X_mean="",Mag_X_SD="",Mag_Y_mean="",Mag_Y_SD="",Mag_Z_mean="",Mag_Z_SD="",Mag_MEAN="",Mag_SD="",Roll_mean="",Pitch_mean="",Yaw_mean="",RSSI_mean="")
library(dplyr)
mergiato %>% 
  select(grep("^(?!.*_[XYZ]_)",
              names(mergiato),
              perl = TRUE,
              ignore.case = TRUE,
              value = TRUE))
#>   subject Gender Weight Height Age test act VelInc_MEAN VelInc_SD
#> 1                                                                
#>   OriInc_w_mean OriInc_w_SD OriInc_MEAN OriInc_SD Acc_MEAN Acc_SD Gyr_MEAN
#> 1                                                                         
#>   Gyr_SD Mag_MEAN Mag_SD Roll_mean Pitch_mean Yaw_mean RSSI_mean
#> 1
1 Like

In this case I'd probably find tidyselect easier to read and maintain (I'm admittedly heavily biased against regex, though! :stuck_out_tongue:):

library(tidyverse)

mergiato <- data.frame(subject="" , Gender="",Weight="", Height="",Age="",test="",act="",VelInc_X_mean="",VelInc_Y_mean="",VelInc_Z_mean="", VelInc_X_SD="",VelInc_Y_SD="",VelInc_Z_SD="",VelInc_MEAN="",VelInc_SD="", OriInc_w_mean="",OriInc_w_SD="",OriInc_x_mean="",OriInc_x_SD="",OriInc_y_mean="",OriInc_y_SD="",OriInc_z_mean="",OriInc_z_SD="",OriInc_MEAN="",OriInc_SD="",Acc_X_SD="",Acc_Y_mean="",Acc_Y_SD="",Acc_Z_mean="",Acc_Z_SD="",Acc_MEAN="",Acc_SD="",Gyr_X_mean="",Gyr_X_SD="", Gyr_Y_mean="",Gyr_Y_SD="",Gyr_Z_mean="",Gyr_Z_SD="",Gyr_MEAN="",Gyr_SD="",Mag_X_mean="",Mag_X_SD="",Mag_Y_mean="",Mag_Y_SD="",Mag_Z_mean="",Mag_Z_SD="",Mag_MEAN="",Mag_SD="",Roll_mean="",Pitch_mean="",Yaw_mean="",RSSI_mean="")

# note that select helpers like contains() hav ignore.case = TRUE by default
mergiato %>% 
    select(
      -contains('_x_'),
      -contains('_y_'),
      -contains('_z_'))
2 Likes

Yes, I admit that tidyselect is easier to read however for more complex situations regex are much more flexible. BTW I have simplified my regex expression a little bit to make it more competitive.

mergiato %>% 
  select(grep("^(?!.*_[XYZ]_)",
              names(mergiato),
              perl = TRUE,
              ignore.case = TRUE,
              value = TRUE))
1 Like

2u4fxf

Don't mind me having a sulk :laughing: (But yes, I agree that regex is an important and often irreplaceable tool!)

1 Like

thank you all guys

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.