Date Fields - Do they need processing for my analysis

Hello everyone,
Hope you are well. I have a crime dataset in csv file format and I have these variables in there along with categories of crime. I am posting the summary for each of the variables.

The reporteddate field is in UNIX time format and it looks like it was already converted to our date format and is placed in various columns in the CSV file. When I read the file in R, below is the summary I get.

Questions:
should I worry about the integer type for those variables when I do the analysis? For example, reported year is integer, so is reported day, and reported hour. What should I do with these variables? Later on when I do datacuts to see the crime categories by year, day or hour, will the below class suffice or should its class be changed? I am a little confused here.
reporteddate : num 1.39e+12 1.40e+12 1.40e+12 1.40e+12 1.40e+12 . reportedyear : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 …
reportedmonth : Factor w/ 12 levels "April","August",..: 8 1 1 1 1 8 8 4 5 5 ... reportedday : int 9 1 1 1 1 16 16 15 1 1 …
reporteddayofweek : Factor w/ 7 levels "Friday ","Monday ",..: 4 6 6 6 6 4 4 3 7 7 ... reportedhour : int 18 16 16 16 20 7 2 3 16 14 …
Any help or pointers you can provide will be very helpful.

The integer values of year, day and hour seem fine. I am not sure the reporteddate was imported correctly. It seems to have values near 1e12 when it should be 1e9 for dates near the present. Also, the factors for reportedmonth and reporteddayofweek are ordered alphabetically. I would change those to have the usual order of January, February... and Monday, Tuesday... just to avoid possible problems later.

1 Like

Thank you so much for your insights. I'd be having more questions as I begin this project for my course. Great catch with the alphabetical ordering of factors. Few more questions if you don't mind:
a. How would I change it back to the usual order for both the week and the month?
b. Also, since the dataset has occurence dates and reported dates, for analysis, it's the occurence date that is important in my opinion when building models. Does it really matter? I can see the lag perhaps the date an incident occured vs when it was reported?

You can change the levels of a factor with the factor() function. I will post an example below. I did not list all of the months just to save some typing.

As for which variables are important, I have no experience with crime data and I have not seen your data. It sounds plausible that the occurrence date is the most important but it is not a good idea to ignore a variable based on a hunch.

Okay. thank you. I shall wait for that example code. How do I share the open data set with the community? The CSV file is over 100MB I think.

Oops, I didn't post the code.

DF <- data.frame(Month = c("January", "September", "May", "July", "March", "November"))
levels(DF$Month)
#> [1] "January"   "July"      "March"     "May"       "November"  "September"
DF$Month <- factor(DF$Month, levels = c("January","March", "May", "July", "September", "November"))
levels(DF$Month)
#> [1] "January"   "March"     "May"       "July"      "September" "November"

Created on 2020-05-22 by the reprex package (v0.3.0)

Okay. Thank you. I shall try copy pasting a sample of the set later.

Thanks a bunch. This worked.

I tried this code and it worked for the month, however, for the week it turned everything to 0 in the data for the days of the week.
table(EDAfilter$reporteddayofweek)

Friday Monday Saturday Sunday Thursday Tuesday Wednesday
29999 30440 28628 28674 28958 29432 29190

EDAfilter$reporteddayofweek<-factor(EDAfilter$reporteddayofweek, levels=c("Monday","Tuesday","Wednesday",”Thursday","Friday", "Saturday", "Sunday"))
I tried the above code and checked the table and 0 was what I got.
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 0 0 0 0 0 0

Any reason why this is happening?

Could you post a small sample of your data. Run the command

dput(head(EDAfilter))

and post the output here. Please put a line containing only three back ticks, ```, before and after the output so that it gets formatted properly.

"GO-2019770786", "GO-2019770798", "GO-2019770865", "GO-2019771055", 
"GO-2019771076", "GO-2019771114", "GO-2019771118", "GO-2019771171", 
"GO-2019771203", "GO-2019771217", "GO-201977147", "GO-2019771599", 
"GO-2019771715", "GO-2019771742", "GO-2019771753", "GO-2019771783", 
"GO-2019771870", "GO-201977191", "GO-2019772046", "GO-2019772188", 
"GO-2019772196", "GO-2019772240", "GO-2019772351", "GO-2019772526", 
"GO-2019772549", "GO-2019772553", "GO-2019772603", "GO-2019772676", 
"GO-2019772739", "GO-2019772786", "GO-2019772805", "GO-2019772949", 
"GO-2019773031", "GO-2019773039", "GO-2019773103", "GO-2019773128", 
"GO-2019773145", "GO-2019773244", "GO-2019773270", "GO-2019773481", 
"GO-201977376", "GO-2019773764", "GO-2019773843", "GO-2019773900", 
"GO-2019773927", "GO-2019773942", "GO-2019773962", "GO-201977433", 
"GO-201977434", "GO-2019774432", "GO-2019774448", "GO-2019774457", 
"GO-2019774535", "GO-2019774568", "GO-2019774606", "GO-2019774697", 
"GO-2019774723", "GO-2019774850", "GO-2019774876", "GO-2019774879", 
"GO-201977501", "GO-2019775010", "GO-2019775023", "GO-2019775260", 
"GO-201977537", "GO-2019775370", "GO-2019775428", "GO-2019775448", 
"GO-2019775489", "GO-201977554", "GO-2019775647", "GO-2019775750", 
"GO-2019775810", "GO-2019775841", "GO-2019775870", "GO-2019775895", 
"GO-2019775898", "GO-2019776046", "GO-2019776137", "GO-2019776159", 
"GO-2019776165", "GO-2019776181", "GO-2019776279", "GO-2019776333", 
"GO-2019776385", "GO-2019776462", "GO-2019776532", "GO-2019776573", 
"GO-2019776711", "GO-2019776712", "GO-2019776866", "GO-2019776999", 
"GO-2019777030", "GO-2019777101", "GO-2019777142", "GO-2019777261", 
"GO-2019777306", "GO-2019777307", "GO-2019777366", "GO-2019777408", 
"GO-2019777492", "GO-2019777497", "GO-2019777582", "GO-2019777756", 
"GO-2019777760", "GO-2019777899", "GO-2019777944", "GO-2019777985", 
"GO-2019778001", "GO-2019778021", "GO-2019778076", "GO-2019778195", 
"GO-201977820", "GO-2019778297", "GO-2019778394", "GO-2019778400", 
"GO-2019778505", "GO-201977851", "GO-2019778510", "GO-2019778675", 
"GO-2019778744", "GO-2019778815", "GO-2019778843", "GO-2019779033", 
"GO-2019779194", "GO-2019779205", "GO-2019779434", "GO-2019779467", 
"GO-2019779500", "GO-2019779507", "GO-2019779529", "GO-2019779565", 
"GO-2019779578", "GO-2019779600", "GO-2019779654", "GO-2019779698", 
"GO-2019779710", "GO-2019779859", "GO-2019779900", "GO-2019779904", 
"GO-2019779955", "GO-2019780063", "GO-2019780064", "GO-2019780071", 
"GO-2019780094", "GO-2019780172", "GO-2019780226", "GO-2019780366", 
"GO-2019780406", "GO-2019780407", "GO-2019780441", "GO-2019780443", 
"GO-2019780448", "GO-2019780674", "GO-2019780686", "GO-2019780845", 
"GO-2019781037", "GO-2019781179", "GO-2019781183", "GO-2019781189", 
"GO-2019781260", "GO-2019781440", "GO-2019781443", "GO-2019781579", 
"GO-2019781599", "GO-2019781638", "GO-2019781678", "GO-2019781745", 
"GO-2019781986", "GO-2019782052", "GO-2019782062", "GO-2019782128", 
"GO-2019782156", "GO-2019782172", "GO-2019782241", "GO-201978228", 
"GO-2019782337", "GO-2019782461", "GO-2019782832", "GO-2019783111", 
"GO-2019783162", "GO-2019783238", "GO-2019783282", "GO-2019783302", 
"GO-2019783341", "GO-2019783578", "GO-2019783748", "GO-2019783835", 
"GO-2019783977", "GO-201978398", "GO-201978403", "GO-2019784052", 
"GO-2019784172", "GO-2019784271", "GO-2019784368", "GO-2019784461", 
"GO-2019784539", "GO-2019784825", "GO-2019784956", "GO-2019785122", 
"GO-201978521", "GO-2019785320", "GO-2019785350", "GO-201978538", 
"GO-2019785389", "GO-2019785463", "GO-2019785551", "GO-2019785709", 
"GO-2019785803", "GO-2019785960", "GO-2019785996", "GO-2019786010", 
"GO-20197861", "GO-2019786171", "GO-2019786254", "GO-2019786310", 
"GO-2019786315", "GO-2019786358", "GO-2019786398", "GO-2019786399", 
"GO-2019786527", "GO-2019786566", "GO-2019786615", "GO-2019786667", 
"GO-201978669", "GO-2019786729", "GO-201978676", "GO-2019786872", 
"GO-2019786875", "GO-2019786957", "GO-2019787058", "GO-2019787158", 
"GO-2019787385", "GO-2019787523", "GO-2019787595", "GO-2019787617", 
"GO-2019787673", "GO-2019787734", "GO-2019787851", "GO-2019787891", 
"GO-2019787918", "GO-2019787953", "GO-2019788048", "GO-2019788049", 
"GO-2019788179", "GO-2019788364", "GO-2019788532", "GO-2019788534", 
"GO-2019788596", "GO-2019788616", "GO-2019788698", "GO-2019788766", 
"GO-2019788825", "GO-2019788869", "GO-2019788962", "GO-2019788967", 
"GO-2019788982", "GO-2019789064", "GO-2019789232", "GO-2019789314", 
"GO-2019789405", "GO-2019789411", "GO-201978967", "GO-2019789759", 
"GO-2019789854", "GO-2019789926", "GO-201979", "GO-2019790008", 
"GO-2019790064", "GO-2019790069", "GO-2019790172", "GO-2019790246", 
"GO-201979027", "GO-2019790408", "GO-2019790630", "GO-2019790716", 
"GO-2019790789", "GO-2019790818", "GO-2019791186", "GO-201979121", 
"GO-2019791272", "GO-2019791343", "GO-2019791450", "GO-2019791497", 
"GO-201979184", "GO-2019791853", "GO-2019791912", "GO-201979203", 
"GO-2019792149", "GO-2019792191", "GO-2019792252", "GO-2019792333", 
"GO-2019792380", "GO-2019792514", "GO-2019792636", "GO-2019792655", 
"GO-2019792656", "GO-2019792703", "GO-201979278", "GO-2019792815", 
"GO-2019792932", "GO-2019792979", "GO-2019793268", "GO-201979328", 
"GO-2019793303", "GO-2019793309", "GO-2019793353", "GO-2019793386", 
"GO-2019793429", "GO-2019793549", "GO-2019793564", "GO-2019793633", 
"GO-2019793704", "GO-2019793718", "GO-2019793833", "GO-2019793970", 
"GO-2019793971", "GO-2019794045", "GO-2019794216", "GO-2019794238", 
"GO-2019794321", "GO-2019794465", "GO-2019794577", "GO-2019794643", 
"GO-2019794695", "GO-2019794809", "GO-2019794901", "GO-2019794934", 
"GO-2019794992", "GO-2019795052", "GO-2019795094", "GO-2019795125", 
"GO-2019795249", "GO-2019795396", "GO-2019795502", "GO-2019795524", 
"GO-2019795715", "GO-2019795756", "GO-2019796011", "GO-2019796167", 
"GO-2019796312", "GO-2019796419", "GO-2019796617", "GO-2019796628", 
"GO-2019796637", "GO-2019796736", "GO-2019796739", "GO-2019796891", 
"GO-2019796897", "GO-201979690", "GO-2019796902", "GO-2019797036", 
"GO-2019797193", "GO-2019797227", "GO-2019797269", "GO-2019797284", 
"GO-2019797298", "GO-2019797304", "GO-2019797365", "GO-2019797475", 
"GO-2019797532", "GO-2019797618", "GO-2019797634", "GO-2019797671", 
"GO-2019797807", "GO-201979781", "GO-2019797815", "GO-2019798063", 
"GO-2019798126", "GO-2019798170", "GO-2019798257", "GO-2019798372", 
"GO-201979851", "GO-2019798723", "GO-2019798766", "GO-2019798834", 
"GO-2019798960", "GO-2019798984", "GO-2019798999", "GO-20197990", 
"GO-2019799075", "GO-2019799159", "GO-2019799193", "GO-201979920", 
"GO-2019799266", "GO-2019799291", "GO-2019799356", "GO-2019799390", 
"GO-2019799393", "GO-2019799586", "GO-201979969", "GO-2019799735", 
"GO-2019799773", "GO-2019799929", "GO-2019800074", "GO-2019800150", 
"GO-2019800328", "GO-2019800341", "GO-2019800352", "GO-2019800371", 
"GO-2019800379", "GO-2019800408", "GO-2019800413", "GO-2019800419", 
"GO-2019800429", "GO-2019800449", "GO-2019800604", "GO-2019800839", 
"GO-2019800857", "GO-2019801047", "GO-20198010620", "GO-2019801122", 
"GO-2019801133", "GO-201980117", "GO-2019801435", "GO-2019801473", 
"GO-2019801518", "GO-2019801549", "GO-2019801572", "GO-201980159", 
"GO-201980170", "GO-2019801711", "GO-2019801759", "GO-2019801823", 
"GO-2019801923", "GO-2019801948", "GO-2019802025", "GO-2019802107", 
"GO-2019802117", "GO-2019802149", "GO-2019802163", "GO-2019802164", 
"GO-2019802237", "GO-2019802349", "GO-2019802365", "GO-201980243", 
"GO-2019802459", "GO-2019802578", "GO-2019802685", "GO-2019802720", 
"GO-2019802722", "GO-2019802749", "GO-2019802838", "GO-2019802849", 
"GO-2019802850", "GO-201980286", "GO-2019802921", "GO-2019803014", 
"GO-2019803070", "GO-2019803197", "GO-2019803242", "GO-201980342", 
"GO-2019803513", "GO-2019803721", "GO-2019804004", "GO-2019804031", 
"GO-2019804101", "GO-2019804107", "GO-2019804300", "GO-2019804328", 
"GO-2019804337", "GO-2019804486", "GO-2019804511", "GO-2019804532", 
"GO-201980455", "GO-2019804614", "GO-2019804644", "GO-201980465", 
"GO-2019804712", "GO-2019804715", "GO-2019804717", "GO-2019804740", 
"GO-2019804787", "GO-2019804807", "GO-2019804974", "GO-2019804993", 
"GO-2019805006", "GO-2019805068", "GO-2019805140", "GO-2019805216", 
"GO-201980525", "GO-201980533", "GO-2019805337", "GO-2019805439", 
"GO-2019805468", "GO-2019805488", "GO-20198055", "GO-2019805554", 
"GO-2019805578", "GO-2019805927", "GO-2019805943", "GO-2019805987", 
"GO-201980607", "GO-2019806203", "GO-2019806229", "GO-2019806248", 
"GO-2019806360", "GO-2019806392", "GO-2019806514", "GO-2019806575", 
"GO-2019806660", "GO-2019806750", "GO-2019806757", "GO-2019806794", 
"GO-2019806844", "GO-2019806941", "GO-2019807070", "GO-201980708", 
"GO-2019807252", "GO-2019807366", "GO-2019807509", "GO-2019807903", 
"GO-2019808018", "GO-2019808079", "GO-2019808225", "GO-2019808272", 
"GO-2019808295", "GO-2019808420", "GO-2019808424", "GO-2019808435", 
"GO-2019808560", "GO-2019808621", "GO-2019808630", "GO-2019808742", 
"GO-2019808849", "GO-2019808898", "GO-2019808983", "GO-2019809044", 
"GO-2019809119", "GO-201980920", "GO-2019809203", "GO-2019809210", 
"GO-2019809238", "GO-2019809305", "GO-201980935", "GO-2019809478", 
"GO-2019809488", "GO-2019809495", "GO-2019809572", "GO-2019809616", 
"GO-2019809760", "GO-2019809839", "GO-2019809953", "GO-201981018", 
"GO-2019810207", "GO-2019810211", "GO-201981036", "GO-2019810398", 
"GO-2019810505", "GO-2019810541", "GO-2019810625", "GO-2019810684", 
"GO-2019810697", "GO-2019810865", "GO-2019810921", "GO-2019811074", 
"GO-2019811087", "GO-2019811120", "GO-2019811195", "GO-2019811273", 
"GO-2019811288", "GO-2019811314", "GO-2019811403", "GO-2019811429", 
"GO-2019811468", "GO-2019811545", "GO-2019811601", "GO-201981163", 
"GO-2019811718", "GO-2019811830", "GO-2019811865", "GO-2019811893", 
"GO-2019811924", "GO-2019811984", "GO-2019812036", "GO-2019812076", 
"GO-201981218", "GO-2019812209", "GO-2019812238", "GO-2019812348", 
"GO-2019812359", "GO-2019812371", "GO-2019812393", "GO-2019812553", 
"GO-2019812577", "GO-2019812647", "GO-2019812700", "GO-2019812713", 
"GO-201981273", "GO-2019812796", "GO-201981286", "GO-201981292", 
"GO-2019812992", "GO-2019813038", "GO-2019813109", "GO-2019813467", 
"GO-2019813499", "GO-2019813515", "GO-2019813524", "GO-2019813528", 
"GO-2019813552", "GO-2019813605", "GO-201981364", "GO-2019813656", 
"GO-2019813733", "GO-2019813988", "GO-2019814295", "GO-2019814299", 
"GO-2019814311", "GO-2019814446", "GO-2019814494", "GO-2019814513", 
"GO-2019814587", "GO-2019814721", "GO-2019814739", "GO-2019814792", 
"GO-2019814815", "GO-2019814821", "GO-2019814833", "GO-2019814858", 
"GO-2019814873", "GO-201981488", "GO-2019814921", "GO-2019814949", 
"GO-2019815057", "GO-2019815113", "GO-2019815124", "GO-2019815219", 
"GO-2019815255", "GO-2019815370", "GO-2019815416", "GO-2019815423", 
"GO-2019815519", "GO-2019815579", "GO-201981563", "GO-201981564", 
"GO-2019815649", "GO-201981569", "GO-2019815772", "GO-2019815781", 
"GO-201981585", "GO-2019816124", "GO-2019816148", "GO-2019816249", 
"GO-2019816323", "GO-2019816341", "GO-2019816379", "GO-2019816428", 
"GO-2019816446", "GO-2019816508", "GO-2019816511", "GO-2019816575", 
"GO-2019816576", "GO-2019816633", "GO-2019816815", "GO-2019816836", 
"GO-2019816868", "GO-2019816943", "GO-201981698", "GO-2019817023", 
"GO-201981705", "GO-2019817050", "GO-2019817060", "GO-2019817099", 
"GO-2019817135", "GO-2019817154", "GO-2019817156", "GO-2019817244", 
"GO-2019817297", "GO-2019817395", "GO-2019817445", "GO-201981750", 
"GO-2019817676", "GO-2019817678", "GO-2019817711", "GO-2019817716", 
"GO-2019817828", "GO-2019817931", "GO-2019817946", "GO-2019817984", 
"GO-2019818061", "GO-2019818102", "GO-2019818150", "GO-2019818165", 
"GO-2019818172", "GO-2019818200", "GO-2019818202", "GO-2019818492", 
"GO-2019818629", "GO-2019818873", "GO-2019818901", "GO-2019818966", 
"GO-201981898", "GO-2019818997", "GO-2019819000", "GO-2019819129", 
"GO-2019819132", "GO-2019819144", "GO-2019819242", "GO-2019819260", 
"GO-2019819290", "GO-201981934", "GO-2019819448", "GO-2019819490", 
"GO-2019819549", "GO-2019819633", "GO-2019819644", "GO-2019819767", 
"GO-2019819808", "GO-2019819931", "GO-2019819936", "GO-2019820018", 
"GO-201982002", "GO-2019820157", "GO-2019820224", "GO-2019820354", 
"GO-201982045", "GO-2019820638", "GO-2019820647", "GO-201982065", 
"GO-2019820698", "GO-2019820855", "GO-201982088", "GO-2019821089", 
"GO-2019821235", "GO-201982129", "GO-2019821321", "GO-2019821369", 
"GO-2019821477", "GO-2019821484", "GO-2019821500", "GO-201982162", 
"GO-2019821639", "GO-2019821666", "GO-2019821775", "GO-201982193", 
"GO-2019822102", "GO-2019822133", "GO-2019822276", "GO-2019822318", 
"GO-2019822323", "GO-2019822420", "GO-2019822511", "GO-2019822634", 
"GO-2019822660", "GO-2019822724", "GO-2019822775", "GO-2019822816", 
"GO-2019822827", "GO-2019822903", "GO-2019822918", "GO-2019822979", 
"GO-2019823187", "GO-2019823193", "GO-2019823302", "GO-2019823365", 
"GO-2019823469", "GO-2019823602", "GO-2019823650", "GO-2019823680", 
"GO-201982372", "GO-2019823730", "GO-201982380", "GO-2019823834", 
"GO-2019823843", "GO-2019823880", "GO-2019823902", "GO-2019823973", 
"GO-2019824057", "GO-2019824207", "GO-201982436", "GO-201982456", 
"GO-2019824613", "GO-2019824633", "GO-2019824653", "GO-2019824792", 
"GO-201982483", "GO-2019824897", "GO-2019824972", "GO-2019824989", 
"GO-2019879876", "GO-2019879911", "GO-201988013", "GO-2019880200", 
"GO-2019880277", "GO-2019880343", "GO-2019880346", "GO-2019880357", 
"GO-2019880370", "GO-2019880411", "GO-2019880421", "GO-2019880426", 
"GO-2019880711", "GO-2019880822", "GO-2019880825", "GO-2019881062", 
"GO-2019881167", "GO-2019881172", "GO-2019881194", "GO-201988125", 
"GO-2019881265", "GO-2019881323", "GO-2019881374", "GO-2019881462", 
"GO-201988150", "GO-2019881604", "GO-2019881605", "GO-2019881645", 
"GO-2019881681", "GO-2019881789", "GO-2019881830", "GO-2019882170", 
"GO-2019882208", "GO-2019882232", "GO-201988226", "GO-201988232", 
"GO-2019882416", "GO-2019882469", "GO-2019882473", "GO-2019882518", 
"GO-2019882769", "GO-2019882772", "GO-2019882778", "GO-201988280", 
"GO-201988285", "GO-2019882910", "GO-2019882914", "GO-201988292", 
"GO-2019882980", "GO-201988315", "GO-2019883166", "GO-2019883192", 
"GO-201988324", "GO-2019883251", "GO-2019883319", "GO-2019883433", 
"GO-201988345", "GO-201988355", "GO-2019883566", "GO-2019883635", 
"GO-2019883690", "GO-2019883837", "GO-2019884116", "GO-2019884121", 
"GO-2019884122", "GO-2019884143", "GO-2019884164", "GO-2019884189", 
"GO-2019884297", "GO-2019884311", "GO-201988438", "GO-2019884384", 
"GO-2019884414", "GO-2019884495", "GO-201988467", "GO-2019884692", 
"GO-2019884742", "GO-201988499", "GO-2019885127", "GO-2019885249", 
"GO-201988544", "GO-2019885475", "GO-2019885598", "GO-2019885605", 
"GO-2019885622", "GO-2019885773", "GO-2019885921", "GO-2019885925", 
"GO-2019885969", "GO-2019885991", "GO-201988617", "GO-2019886199", 
"GO-2019886238", "GO-2019886250", "GO-2019886374", "GO-2019886420", 
"GO-2019886462", "GO-2019886541", "GO-2019886823", "GO-2019886927", 
"GO-2019886959", "GO-201988702", "GO-2019887285", "GO-2019887325", 
"GO-2019887347", "GO-2019887369", "GO-2019887400", "GO-2019887503", 
"GO-2019887526", "GO-201988756", "GO-2019887594", "GO-2019887596", 
"GO-2019887608", "GO-201988764", "GO-2019887715", "GO-2019887746", 
"GO-2019887888", "GO-2019887893", "GO-2019888000", "GO-2019888053", 
"GO-2019888156", "GO-2019888550", "GO-2019888943", "GO-2019889", 
"GO-201988901", "GO-2019889026", "GO-2019889149", "GO-2019889191", 
"GO-2019889210", "GO-2019889454", "GO-2019889463", "GO-2019889469", 
"GO-2019889655", "GO-201988977", "GO-20198899", "GO-2019889902", 
"GO-2019889950", "GO-2019890132", "GO-2019890135", "GO-2019890152", 
"GO-2019890153", "GO-2019890332", "GO-2019890397", "GO-2019890583", 
"GO-2019890791", "GO-2019890855", "GO-2019890984", "GO-201989106", 
"GO-2019891063", "GO-2019891107", "GO-2019891111", "GO-2019891155", 
"GO-2019891173", "GO-2019891227", "GO-201989124", "GO-2019891251", 
"GO-2019891419", "GO-2019891477", "GO-2019891493", "GO-2019891712", 
"GO-2019891992", "GO-2019892009", "GO-201989209", "GO-2019892126", 
"GO-2019892195", "GO-2019892262", "GO-2019892263", "GO-2019892303", 
"GO-2019892402", "GO-2019892425", "GO-2019892459", "GO-2019892481", 
"GO-2019892533", "GO-2019892536", "GO-201989292", "GO-2019892984", 
"GO-2019893165", "GO-2019893341", "GO-2019893355", "GO-2019893418", 
"GO-2019893500", "GO-2019893764", "GO-2019893961", "GO-2019894034", 
"GO-2019894135", "GO-2019894161", "GO-2019894257", "GO-2019894275", 
"GO-2019894345", "GO-2019894411", "GO-201989449", "GO-2019894580", 
"GO-2019894623", "GO-2019894683", "GO-2019894714", "GO-2019894938", 
"GO-2019895165", "GO-2019895197", "GO-201989527", "GO-2019895305", 
"GO-2019895306", "GO-2019895425", "GO-2019895434", "GO-201989554", 
"GO-2019895626", "GO-2019895743", "GO-201989593", "GO-2019896017", 
"GO-201989602", "GO-2019896108", "GO-2019896127", "GO-2019896166", 
"GO-2019896321", "GO-2019896568", "GO-2019896649", "GO-2019896722", 
"GO-2019896823", "GO-2019896837", "GO-2019896865", "GO-2019896880", 
"GO-2019896882", "GO-2019896940", "GO-2019897084", "GO-2019897131", 
"GO-2019897204", "GO-2019897237", "GO-2019897249", "GO-2019897254", 
"GO-2019897264", "GO-2019897271", "GO-2019897454", "GO-2019897526", 
"GO-2019897597", "GO-2019897642", "GO-20198977", "GO-2019897731", 
"GO-2019897770", "GO-2019897968", "GO-2019897993", "GO-2019898115", 
"GO-2019898137", "GO-2019898382", "GO-2019898432", "GO-2019898527", 
"GO-2019898587", "GO-2019898709", "GO-2019898746", "GO-2019898776", 
"GO-2019898781", "GO-2019898807", "GO-2019898820", "GO-2019898944", 
"GO-201989896", "GO-2019899008", "GO-2019899060", "GO-2019899081", 
"GO-2019899201", "GO-2019899231", "GO-201989927", "GO-2019899442", 
"GO-2019899549", "GO-2019899552", "GO-2019899559", "GO-2019899599", 
"GO-2019899800", "GO-2019899831", "GO-2019899904", "GO-2019899931", 
"GO-2019899987", "GO-20199000346", "GO-20199000352", "GO-20199000465", 
"GO-201990005", "GO-20199001098", "GO-20199001327", "GO-20199001336", 
"GO-20199001449", "GO-20199001708", "GO-20199001723", "GO-20199001800", 
"GO-2019900255", "GO-2019900262", "GO-20199003274", "GO-20199003801", 
"GO-20199004185", "GO-20199004242", "GO-2019900458", "GO-20199004726", 
"GO-2019900479", "GO-20199005062", "GO-20199005336", "GO-2019900534", 
"GO-20199005351", "GO-20199005557", "GO-20199005561", "GO-20199005616", 
"GO-20199005806", "GO-2019900593", "GO-20199006293", "GO-2019900632", 
"GO-2019964325", "GO-201996449", "GO-2019964501", "GO-2019964627", 
"GO-201996464", "GO-2019964645", "GO-2019964649", "GO-2019964895", 
"GO-2019965090", "GO-2019965109", "GO-2019965177", "GO-2019965284", 
"GO-201996533", "GO-2019965397", "GO-2019965413", "GO-2019965414", 
"GO-2019965464", "GO-2019965465", "GO-2019965496", "GO-2019965504", 
"GO-201996560", "GO-2019965641", "GO-2019965727", "GO-2019965736", 
"GO-2019965763", "GO-201996592", "GO-2019965920", "GO-2019965944", 
"GO-2019965946", "GO-2019965964", "GO-201996600", "GO-2019966022", 
"GO-2019966062", "GO-2019966075", "GO-201996610", "GO-2019966133", 
"GO-2019966239", "GO-2019966321", "GO-2019966337", "GO-2019966390", 
"GO-2019966414", "GO-2019966464", "GO-2019966562", "GO-2019966696", 
"GO-2019966865", "GO-2019966925", "GO-2019966994", "GO-2019967225", 
"GO-2019967237", "GO-2019967318", "GO-2019967359", "GO-2019967362", 
"GO-2019967427", "GO-2019967510", "GO-2019967572", "GO-2019967574", 
"GO-2019967607", "GO-2019967626", "GO-2019967632", "GO-2019967647", 
"GO-2019967688", "GO-2019967717", "GO-2019967810", "GO-2019967918", 
"GO-2019967972", "GO-2019968038", "GO-2019968047", "GO-201996810", 
"GO-2019968241", "GO-2019968286", "GO-2019968386", "GO-2019968402", 
"GO-201996847", "GO-2019968550", "GO-2019968557", "GO-201996859", 
"GO-2019968693", "GO-2019968722", "GO-2019968742", "GO-2019968791", 
"GO-2019968825", "GO-2019968934", "GO-2019968951", "GO-201996897", 
"GO-201996900", "GO-2019969056", "GO-2019969066", "GO-2019969365", 
"GO-2019969442", "GO-2019969448", "GO-2019969457", "GO-2019969499", 
"GO-2019969712", "GO-2019969729", "GO-2019969790", "GO-2019969936", 
"GO-2019969997", "GO-2019970014", "GO-201997007", "GO-2019970128", 
"GO-2019970210", "GO-2019970215", "GO-2019970356", "GO-2019970364", 
"GO-2019970655", "GO-201997078", "GO-2019970806", "GO-2019970856", 
"GO-2019970900", "GO-2019970937", "GO-2019971029", "GO-2019971060", 
"GO-2019971085", "GO-2019971106", "GO-2019971168", "GO-201997122", 
"GO-2019971291", "GO-2019971348", "GO-2019971377", "GO-2019971382", 
"GO-2019971383", "GO-2019971420", "GO-2019971441", "GO-2019971493", 
"GO-2019971528", "GO-2019971611", "GO-2019971676", "GO-2019971903", 
"GO-2019971909", "GO-2019971927", "GO-2019971929", "GO-2019972063", 
"GO-2019972111", "GO-2019972223", "GO-2019972347", "GO-2019972373", 
"GO-2019972425", "GO-2019972432", "GO-2019972586", "GO-2019972659", 
"GO-2019972737", "GO-2019972761", "GO-201997278", "GO-2019972833", 
"GO-2019972844", "GO-2019972869", "GO-2019972875", "GO-2019973044", 
"GO-201997315", "GO-2019973223", "GO-2019973361", "GO-2019973434", 
"GO-2019973490", "GO-2019973527", "GO-2019973547", "GO-2019973566", 
"GO-2019973628", "GO-2019973688", "GO-2019973692", "GO-2019973732", 
"GO-2019973876", "GO-2019973895", "GO-2019973899", "GO-2019973904", 
"GO-2019974168", "GO-2019974171", "GO-2019974190", "GO-2019974277", 
"GO-2019974483", "GO-2019974596", "GO-2019974627", "GO-2019974658", 
"GO-2019974800", "GO-201997481", "GO-2019974906", "GO-2019974931", 
"GO-2019974945", "GO-2019975003", "GO-2019975238", "GO-2019975351", 
"GO-201997538", "GO-2019975409", "GO-2019975465", "GO-2019975635", 
"GO-2019975677", "GO-2019975907", "GO-2019975934", "GO-2019976034", 7"
), class = "factor"), occurrencedate = c(1393516800000, 1.396368e+12, 
1.396368e+12, 1.396368e+12, 1396353600000, 1394953500000), reporteddate = c(1394388840000, 
1396370040000, 1396370040000, 1396370040000, 1396384080000, 1394953500000
), premisetype = structure(c(3L, 1L, 1L, 1L, 3L, 5L), .Label = c("Apartment", 
"Commercial", "House", "Other", "Outside"), class = "factor"), 
    ucr_code = c(2132L, 1430L, 1420L, 2120L, 2130L, 1610L), ucr_ext = c(200L, 
    100L, 100L, 220L, 210L, 200L), offence = structure(c(39L, 
    6L, 12L, 17L, 42L, 29L), .Label = c("Administering Noxious Thing", 
    "Aggravated Aslt Peace Officer", "Aggravated Assault", "Aggravated Assault Avails Pros", 
    "Air Gun Or Pistol: Bodily Harm", "Assault", "Assault - Force/Thrt/Impede", 
    "Assault - Resist/ Prevent Seiz", "Assault Bodily Harm", 
    "Assault Peace Officer", "Assault Peace Officer Wpn/Cbh", 
    "Assault With Weapon", "B&E", "B&E - M/Veh To Steal Firearm", 
    "B&E - To Steal Firearm", "B&E Out", "B&E W'Intent", "Crim Negligence Bodily Harm", 
    "Disarming Peace/Public Officer", "Discharge Firearm - Recklessly", 
    "Discharge Firearm With Intent", "Pointing A Firearm", "Robbery - Armoured Car", 
    "Robbery - Atm", "Robbery - Business", "Robbery - Delivery Person", 
    "Robbery - Financial Institute", "Robbery - Home Invasion", 
    "Robbery - Mugging", "Robbery - Other", "Robbery - Purse Snatch", 
    "Robbery - Swarming", "Robbery - Taxi", "Robbery - Vehicle Jacking", 
    "Robbery With Weapon", "Set/Place Trap/Intend Death/Bh", 
    "Theft - Misapprop Funds Over", "Theft From Mail / Bag / Key", 
    "Theft From Motor Vehicle Over", "Theft Of Motor Vehicle", 
    "Theft Of Utilities Over", "Theft Over", "Theft Over - Bicycle", 
    "Theft Over - Distraction", "Theft Over - Shoplifting", "Traps Likely Cause Bodily Harm", 
    "Unlawfully Causing Bodily Harm", "Unlawfully In Dwelling-House", 
    "Use Firearm / Immit Commit Off"), class = "factor"), reportedyear = c(2014L, 
    2014L, 2014L, 2014L, 2014L, 2014L), reportedmonth = structure(c(8L, 
    1L, 1L, 1L, 1L, 8L), .Label = c("April", "August", "December", 
    "February", "January", "July", "June", "March", "May", "November", 
    "October", "September"), class = "factor"), reportedday = c(9L, 
    1L, 1L, 1L, 1L, 16L), reporteddayofyear = c(68L, 91L, 91L, 
    91L, 91L, 75L), reporteddayofweek = structure(c(4L, 6L, 6L, 
    6L, 6L, 4L), .Label = c("Friday    ", "Monday    ", "Saturday  ", 
    "Sunday    ", "Thursday  ", "Tuesday   ", "Wednesday "), class = "factor"), 
    reportedhour = c(18L, 16L, 16L, 16L, 20L, 7L), occurrenceyear = c(2014L, 
    2014L, 2014L, 2014L, 2014L, 2014L), occurrencemonth = structure(c(2L, 
    4L, 4L, 4L, 4L, 3L), .Label = c("January", "February", "March", 
    "April", "May", "June", "July", "August", "September", "October", 
    "November", "December"), class = "factor"), occurrenceday = c(27L, 
    1L, 1L, 1L, 1L, 16L), occurrencedayofyear = c(58L, 91L, 91L, 
    91L, 91L, 75L), occurrencedayofweek = structure(c(6L, 7L, 
    7L, 7L, 7L, 5L), .Label = c("", "Friday    ", "Monday    ", 
    "Saturday  ", "Sunday    ", "Thursday  ", "Tuesday   ", "Wednesday "
    ), class = "factor"), occurrencehour = c(16L, 16L, 16L, 16L, 
    12L, 7L), MCI = structure(c(5L, 1L, 1L, 3L, 5L, 4L), .Label = c("Assault", 
    "Auto Theft", "Break and Enter", "Robbery", "Theft Over"), class = "factor"), 
    Division = structure(c(15L, 10L, 10L, 10L, 8L, 4L), .Label = c("D11", 
    "D12", "D13", "D14", "D22", "D23", "D31", "D32", "D33", "D41", 
    "D42", "D43", "D51", "D52", "D53", "D54", "D55"), class = "factor"), 
    Hood_ID = c(101L, 121L, 121L, 121L, 34L, 85L), Neighbourhood = structure(c(45L, 
    92L, 92L, 92L, 6L, 110L), .Label = c("Agincourt North (129)", 
    "Agincourt South-Malvern West (128)", "Alderwood (20)", "Annex (95)", 
    "Banbury-Don Mills (42)", "Bathurst Manor (34)", "Bay Street Corridor (76)", 
    "Bayview Village (52)", "Bayview Woods-Steeles (49)", "Bedford Park-Nortown (39)", 
    "Beechborough-Greenbrook (112)", "Bendale (127)", "Birchcliffe-Cliffside (122)", 
    "Black Creek (24)", "Blake-Jones (69)", "Briar Hill-Belgravia (108)", 
    "Bridle Path-Sunnybrook-York Mills (41)", "Broadview North (57)", 
    "Brookhaven-Amesbury (30)", "Cabbagetown-South St.James Town (71)", 
    "Caledonia-Fairbank (109)", "Casa Loma (96)", "Centennial Scarborough (133)", 
    "Church-Yonge Corridor (75)", "Clairlea-Birchmount (120)", 
    "South Riverdale (70)", "St.Andrew-Windfields (40)", "Steeles (116)", 
    "Stonegate-Queensway (16)", "Tam O'Shanter-Sullivan (118)", 
    "Taylor-Massey (61)", "The Beaches (63)", "Thistletown-Beaumond Heights (3)", 
    "Thorncliffe Park (55)", "Trinity-Bellwoods (81)", "University (79)", 
    "Victoria Village (43)", "Waterfront Communities-The Island (77)", 
    ), class = "factor"), Long = c(-79.4176865, -79.2783966, 
    -79.2783966, -79.2783966, -79.4601822, -79.4252853), Lat = c(43.7005615, 
    43.7057724, 43.7057724, 43.7057724, 43.7657814, 43.6408386
    ), ObjectId = 1:6), na.action = structure(c(`3458` = 3458L, 
`3459` = 3459L, `4210` = 4210L, `4349` = 4349L, `4350` = 4350L, 
`4549` = 4549L, `5205` = 5205L, `5635` = 5635L, `8013` = 8013L, 
`8014` = 8014L, `9333` = 9333L, `9334` = 9334L, `9335` = 9335L, 
`10475` = 10475L, `11654` = 11654L, `13363` = 13363L, `17771` = 17771L, 
`22639` = 22639L, `22640` = 22640L, `22779` = 22779L, `27627` = 27627L, 
`29462` = 29462L, `34735` = 34735L, `35820` = 35820L, `36922` = 36922L, 
`41605` = 41605L, `44959` = 44959L, `46894` = 46894L, `51702` = 51702L, 
`51764` = 51764L, `56714` = 56714L, `56996` = 56996L, `57823` = 57823L, 
`57987` = 57987L, `64909` = 64909L, `74833` = 74833L, `81079` = 81079L, 
`99996` = 99996L, `124565` = 124565L, `124566` = 124566L, `134222` = 134222L, 
`137124` = 137124L, `147682` = 147682L, `157836` = 157836L, `157840` = 157840L, 
`158205` = 158205L, `160566` = 160566L, `162552` = 162552L, `166017` = 166017L, 
`, class = "omit"), row.names = c(NA, 
6L), class = "data.frame")```

I tried my best, the character count despite using the head was about 65000 and it didnt let me post so I manually deleted some lines. Many thanks, once again.

Well, that didn't work, though it seems you tried what I asked. Since we are just worried about the week column at the moment, try reducing the data set to just that column.

JustWeek <- EDAfilter["reporteddayofweek"]
dput(head(JustWeek))

Okay, This is what it shows: Could it be because they converted the unix time to time stamps and then created separate columns from the dataset?

structure(list(reporteddayofweek = structure(c(4L, 6L, 6L, 6L, 
6L, 4L), .Label = c("Friday    ", "Monday    ", "Saturday  ", 
"Sunday    ", "Thursday  ", "Tuesday   ", "Wednesday "), class = "factor")), row.names = c(NA, 
6L), class = "data.frame")

You reporteddayofweek values have trailing spaces. In the code below I show how this leads to NA values if you run the factor() function on them using levels that do not have the spaces and how to remove the spaces with the gsub function.

DF <- structure(list(reporteddayofweek = 
                       structure(c(4L, 6L, 6L, 6L, 6L, 4L), 
                       .Label = c("Friday    ", "Monday    ", "Saturday  ", "Sunday    ", 
                                  "Thursday  ", "Tuesday   ", "Wednesday "), 
                       class = "factor")), row.names = c(NA, 6L), class = "data.frame")
TEST <- DF
TEST
#>   reporteddayofweek
#> 1        Sunday    
#> 2        Tuesday   
#> 3        Tuesday   
#> 4        Tuesday   
#> 5        Tuesday   
#> 6        Sunday
TEST$reporteddayofweek = factor(TEST$reporteddayofweek,
                                levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
                                           "Friday", "Saturday", "Sunday"))
TEST
#>   reporteddayofweek
#> 1              <NA>
#> 2              <NA>
#> 3              <NA>
#> 4              <NA>
#> 5              <NA>
#> 6              <NA>

DF$reporteddayofweek = gsub(" ", "", DF$reporteddayofweek)
DF
#>   reporteddayofweek
#> 1            Sunday
#> 2           Tuesday
#> 3           Tuesday
#> 4           Tuesday
#> 5           Tuesday
#> 6            Sunday
DF$reporteddayofweek = factor(DF$reporteddayofweek,
                                levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
                                           "Friday", "Saturday", "Sunday"))
DF
#>   reporteddayofweek
#> 1            Sunday
#> 2           Tuesday
#> 3           Tuesday
#> 4           Tuesday
#> 5           Tuesday
#> 6            Sunday

Created on 2020-05-22 by the reprex package (v0.3.0)

1 Like

oh my god. I just don't have words to thank. There is no way I could have figured this out. Many thanks, and I shall try this now and clean up.

I tried to change another variable and got the below. This time it shows an unexpected symbol

 mci<-EDAfilter %>% count(EDAfilter$MCI)
Error in parse(text = x) : <text>:1:7: unexpected symbol
1: Theft Over
          ^

It is very hard to say what is causing that without access to the data. It might be that there is a syntax problem with the csv file. The strange output from your first attempt to use dput() makes me suspicious. If you know the approximate number of rows that your data should have, you can run

nrow(EDAfilter)

and see if the result is plausible. You can also open the csv file with a plain text editor, NotePad on Windows, and look for malformed data. For example, an unpaired quotation mark. You can also search for instances of the phrase Theft Over and see if the file contains something unexpected in one instance.
How big is this csv file?

By the way, the last code you shared should be

mci<-EDAfilter %>% count(MCI)

There is no need to use the data frame name and $ operator within count()

Thank you. I tried this and it worked. The csv file is about 60MB but if I run into other issues shall post.

mci<-EDAfilter %>% group_by(MCI) %>% dplyr::summarise(Total = n())