Data frame Data cleaning

I have a dataset with a column named URL. example ,

URL
football. html
Athletics. html
and so forth. how to I extract the game names from the column and create another column named Games?

this is how I want it to look

URL Games
football.html football
athletics.html athletics

You can use regular expresions (regex), see this example:

library(tidyverse)

sample_df <- data.frame(
    URL = c("football.html", "Athletics.html")
)

sample_df %>% 
    mutate(games = str_remove(URL, "\\.html$"))
#>              URL     games
#> 1  football.html  football
#> 2 Athletics.html Athletics

Created on 2021-05-18 by the reprex package (v2.0.0)

Note: Next time please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

thank you, it worked. Problem is, I also have .php urls, and they appear as urls on the games column. how do I remove both html and php.

I apologise if my questions aren't clear, I'm new here and I'm also new at using R

You have to fine tune the regular expression (i.e. "\\.(html|php)$") to fit your specific application.

Regular Expressions are common to many programming languages and not specific to R, if you are going to be cleaning text often, you might benefit from learning about them.

library(tidyverse)

sample_df <- data.frame(
    URL = c("football.html", "Athletics.php")
)

sample_df %>% 
    mutate(games = str_remove(URL, "\\.(html|php)$"))
#>             URL     games
#> 1 football.html  football
#> 2 Athletics.php Athletics

Created on 2021-05-18 by the reprex package (v2.0.0)

1 Like

is there a way I do this when editing a csv file instead of a data frame? I want to be able to add the game column in the csv file.

Not directly, you have to read the content of the csv file into memory (a data frame) modify it and then save the data back to the csv format.

I tired doing this:
oldata3<- read.csv ("C:/Users/FBDA17-031/Documents/OlyDash/Oldata3.csv")
mutate(Games = str_remove(oldata3$URL, "\.(html|php)$"))
oldata3$Games<-Games

and I got this
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "character"

You are mixing tidyverse and base R syntax, you have to choose one, following my previous example the code would be like this.

library(tidyverse)

# Read the csv file into memory
oldata3 <- read.csv("C:/Users/FBDA17-031/Documents/OlyDash/Oldata3.csv")

# Create the new column
oldata3 <- oldata3 %>% 
    mutate(Games = str_remove(URL, "\.(html|php)$"))

# Write back the data frame to a csv file
write.csv(oldata3, "C:/Users/FBDA17-031/Documents/OlyDash/Oldata3.csv")
1 Like

thank you, it worked :grin:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.