Find and Replace Nonuniform Values

Hi! I am a new R studio user & am working on cleaning a dataset on property values in Detroit from 2012-2019. I am trying to transform the "sale date" values into a column that just contains the year. Right now all of the data right now has precise information on the month/date/year/ time which makes it impossible to regress & also makes it difficult to do a uniform "find & replace" type operation -- There are 300,000+ listings all with a slight variation in the sale date (here is what the summary stats look like):

my.data.clean$sale_date
[1] 2010-01-22T00:00:00.000Z 2010-10-27T00:00:00.000Z 2010-11-29T00:00:00.000Z
[4] 2010-01-22T00:00:00.000Z 2012-05-24T00:00:00.000Z 2012-03-01T00:00:00.000Z
[7] 2012-07-23T00:00:00.000Z 2011-08-30T00:00:00.000Z 2012-03-15T00:00:00.000Z
[10] 2014-03-14T00:00:00.000Z 2014-03-14T00:00:00.000Z 2011-04-01T00:00:00.000Z

Does anyone of a way to do a general find & replace for all values in a column containing 2010 (for example) and to switch it to just the year? i.e. replacing any column containing AT LEAST 2010 into just 2010?

Thank you!!!!!! Apologies for the very elementary question!

Hello,
first is a useful package for manipulating dates.
library(lubridate)
(if you dont yet have this library, install it with install.packages("lubridate") once in your console)

then this should work, to make a sale_date_year column with just the year of the sale date in it.

my.data.clean$sale_date_year <- year(my.data.clean$sale_date)

This worked!!!! Thank you so much!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.