Dear all,
I have a question (maybe silly).
I use zoo library to create year quarters (not sure if there is any better solution) and I have never had problems with it. Now I would like to change some of the values.
In my example I would like to change "2019 Q3" into "2019 Q2" in the first response (2019-08-11) but I'm not sure how I can do it with 'yearqtr' which is neither a string nor number nor date.
I can guess that if you want to change 2019 Q3 to 2019 Q2 then you want to subtract a quarter from each of your observations, but it's not clear from your question.
Some extra detail, though, is that yearqrt is really a numeric vector with printing (and other) properties handled for you via the S3 system. E.g.:
By removing the S3 class, we can print the underlying values. (Alternatively we could use the vctrs package to get that using: vctrs::vec_data(qtrs), which gives the same result).
Either way, you can see how yearqtrs are stored - as doubles with a year and corresponding fractional value for the quarter.
So to subtract a single quarter off (if that what you want to do), you can subtract 1/4 (0.25):
Well, this is only an example file.
Let's say I do analysis and after creating quarters I can see that some of them should be relocated as date is not corresponding to real quarters responses took place. I need to be able to recode Q3 to Q2. Should I then change 2019.50 into 2019.25? If I do it this way:
Is it just that one specific example you'd like to change, or do you have a general rule that you'd like the replacement/changing to follow.
Whilst the reprex you posted is useful, I'm not clear on what question you're actually asking here. I think I've just shown how you can change the underlying values of a yearqtr value, but I'm not clear on what rule/logic you want to apply to change them for your example.
Can you clarify what you're actually trying to do here/what logical steps you want the code to follow?
By the way, this issue is caused by "zoo" library I think as I found two records in my large real data file which were allocated in Q2 2018 despite their dates being "01/07/2018 00:57:00" and "01/07/2018 00:35:00". All other 84 records from 1st of July 2018 were coded into Q3 2018 properly (as well as thousands of other records).
Unfortunately, I cannot submit any example as this issue appears only with a large file, when I select only a handful of records from the same database (including these problematic ones) everything looks ok.
Perhaps "zoo" is not the best option and should be replaced by better, more reliable package to get easy quarterly labels?
I would like to avoid doing something manual like this:
If the strangeness is happening right at the quarter boundaries (date and time), I would look at the time zones in the data, and on the computer you are doing the analysis on.
How have you got to that conclusion (what code have you used to test your assumptions)? Actually, the time zone difference is the most logical explanation, Could you make a reproducible example of the issue you are describing, that includes explicit setting of the time zone?
Well, I have this issue when I import the entire data set within specific date range (64468 records). When I select only 2 URNs containing this problematic date and one other URN (to have a mixture of both types), conversion is fine:
That is just an exemplification, it is not reproducible.
My suspicion is that the second sql query, which contains date filtering, it is forcing a time zone into the fetched dataframe, but since you are not providing a reprex I can't be sure.
And it may also be worth asking, what timezone are you in (and therefore your R session is likely to be in), and what timezone does your database server use?
Exactly what I was thinking. From some of the earlier questions you've posted @Slavek I'm guessing you're not necessarily in UTC (maybe UTC+1), which could be the cause of the issue given that most of your misclassifications are within an hour of a date change at the quarter boundary.
Well, data collection took place in Germany (we are in the UK) but it is possible that two respondents answered the survey online from a different location. Nevertheless, I don't know where this information is saved in the data. All I have is URN and InterviewDate...
This shouldent be relevant unless you are using a special "time date with time zone" column in your database, as I said most likely the problem is generated while executing the query, the data could have been fetched with an automatic conversion to your local timezone