Removing unwanted % signs

Quack · August 19, 2019, 12:35am

In a dataframe that I read in from a csv file, there is a column of percent data that has % signs in it. Whats the best way to remove the %'s and change the variable class from "factor" to "numeric"?
The data look like this

1   1  0:00       100%        
2   1  1:15       100%  
3   1  2:15        100%      
4   1  3:30        100%
5  1   4:00        100%

FJCC · August 19, 2019, 12:59am

You can use the sub() function to do that.

#Make a data frame
df1 <- data.frame(x = 1:4, y = 11:14, z = paste0(rep(100, 4), "%"))
#Change column z
df1$z <- as.numeric(sub("%", "", df1$z))

AJF · August 19, 2019, 1:03am

What about trying the readr function readr::parse_number()? You'd have to first run the as.character() function to turn it from a factor to a character, and then parse_number to turn it into a number

suppressPackageStartupMessages(library(tidyverse))

df1 <- data.frame(col1 = c(1, 1, 1, 1, 1),
        col2 = c("0:00", "1:15", "2:15", "3:30", "4:00"),
        col3 = c("100%", "100%", "100%", "100%", "100%")
)

str(df1)
#> 'data.frame':    5 obs. of  3 variables:
#>  $ col1: num  1 1 1 1 1
#>  $ col2: Factor w/ 5 levels "0:00","1:15",..: 1 2 3 4 5
#>  $ col3: Factor w/ 1 level "100%": 1 1 1 1 1


df1 <- df1 %>% 
  mutate(col3 = readr::parse_number(as.character(col3)))

str(df1)
#> 'data.frame':    5 obs. of  3 variables:
#>  $ col1: num  1 1 1 1 1
#>  $ col2: Factor w/ 5 levels "0:00","1:15",..: 1 2 3 4 5
#>  $ col3: num  100 100 100 100 100

^{Created on 2019-08-18 by the reprex package (v0.3.0)}

A side-benefit of importing it with the readr function read_csv() in the first place (as opposed to the base read.csv()) is that it will automatically start with it as a character...although you could just add an argument to your read.csv() function of stringsAsFactors=FALSE

system · August 26, 2019, 1:03am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.