I am a front end programmer who is used to looking at code but not data. What I mean is: when I am data wrangling with SQL results or data frames, I tend to look at small and large result sets which look more or less like each other.
A lot of times I am looking at the
head of a result frame and it looks alright but turns out the middle section of a 25000 row dataset was incorrect all along.
So how can I tell if the result sets I am wrangling with are correct?
Everything looks so similar and it's easy to make a mistake. Staring at grids of numbers can be confusing.
I guess #rstats packages like visdat with functions like
vis_compare can help compare similar datasets with each other.
Any other software that can help me track anomalies?
Also, does one develop a certain instinct or level of confidence around judging the correctness of one's data as one works with it? Or do you need peers who cross validate your work for you?
Would appreciate some advice from wizened data wranglers.