Suggestions for pointing out bad statistical practice

highlight

#1

Hello -

Occasionally, I will see questions pop up that indicates the asker is doing something that is not statistically sound. The underlying question is about the code - and I may be able to help them with code that will produce what they want. However, the 'should they be doing this?' question seems to be more relevant. I understand that what constitutes 'statistically sound' is often up for debate, but sometimes what is being done is so blatantly wrong, that I feel uncomfortable helping with the code to do it.

I've hesitated in the past to point these things out within the thread to 1) avoid being rude 2) avoid changing the subject of the thread.

Has anyone else encountered this issue? What are your thoughts/suggestions?

Thanks!


#2

I think that as long as you're polite about it, and clear that it's an aside, it's totally fine to point it out. In fact, you might be doing them a big favour!


#3

I always try to point out this kind of thing (if I notice it! code-puzzle tunnel vision happens), but in doing so I try to keep in mind that I rarely know the whole context of what other people are doing, so humility is called for. I also try to make a positive suggestion about what to do instead of the questionable practice (and how to do it in code, if applicable). And I apply the classic critical feedback rule of thumb: criticize the practice, not the person. Nobody is dumb for making a mistake, and lots of problematic practices seem intuitive.

So my personal rubric:

  • :+1: "It sounds like you might be trying to {questionable practice}? The problem I see is, {brief summary of the argument against it, ideally with links to learn more}. You might consider doing {better option}, instead."
  • :-1: "Don't do that, it is Bad Stats" (too high-handed) or "Why would you even do that?" (implies asker is dumb)

#4

In the case that the question is directed to me from someone from the lab, I would normally be polite and ask why he/she wants to do it, or why is not doing "whatever else that would sound better to me", of course, If I know about it. If is a simple task, I would still help, but if it would take me long time (essentially, building a model in jags or TMB) I'll just point out resources that may be of help, as it is not my job to do and I cannot allocate several hours regularly to prototype, test, and polish others stuff, unless the question is very interesting to me too. I would not state things like "this is a wrong approach" or anything similar, because the choice of method could came fro the superior of the person (for example) and I am not a person that would like to generate discrepancies.
If the question is in a general public zone, like here or stack-overflow, I would probably ignore it unless I have very strong thoughts regarding the problem of the approach and the more suitable alternative. A question is a question, and should be answered or ignored, but not changed, as it will remain there for the future and people that may actually need to use the stuff of the question may drop on it though online searches.
There is a bad practice that I cannot stand, naming datasets and variables like my_super_Data_collected_by_me_in_2008 and then my_super_Data_collected_by_me_in_2009 and variables like Tail_length, Time of Day, with consistency close to zero if not negative... for the first time I would politely recommend using shorter names, but for the third time coming from the same person, I would return the data frames as d09, d10 and the columns named V1,V2,V3.... hoping that person would be able to figure out how to rename the columns of a data.frame ha ha ha


#5

I think this community differs from StackOverflow in this capacity. Self-contained questions that stay on one topic are great, and make for a certain platonic ideal (and are part of what make StackOverflow so valuable). However, there's a bit more latitude here in terms of back-and-forth/suggestions and follow-up. As you've described with naming datasets and variables, etc., the appropriate action kind of varies with the individual and your experience with them. I think that there are threads where pointing out resources that might add enlightenment re. statistical assumptions could be very valuable.

Of course, there's no hard and fast rule, in the end.


#6

Yes @mara! I certainly agree that this is no stack-overflow. This is a smaller but much more human-friendly community, and really useful, at least for me :slight_smile: . And I agree that also threads may naturally evolve by users interactions, I was maybe more focused in the others crazy ones on my answer.
And yes, I do that with people I know (and I also go to pubs with some of them every week), and that are starting or not yet big R users. If they don't see the benefits of short and concise names, at least I may be able to make them to learn that it can be fixed with names(Data) <- names(Super_Awful_AND_inconsistent-dataFRAME) I am happy to write a metropolis-hastings algorithm for them if they also put some from their side, not just buying me beers, for what, of course, I am super happy and considered myself very well paid :grinning:


#7

I have learned over time that there is no such thing as a simple statistical problem, and that every problem has its own quirks. So to my mind it is often necessary to ask for more background before giving code advice. I say a lot of things starting with "I think", or "I'm not sure about this because".

But I think we as statisticians have an obligation to point out bad (or iffy) practice, no matter where it comes from. In the service of the science, if you will.


#8

I think there is an obligation to point out possible mistakes. However realize that the "mistake" can happen in several ways that can involve a problem with the person asking the question or the person reading the question. There is a skill in writing a good question. There is also an issue in interpreting the question.
There may be a language issue but your skills at Klingon (my native language) are much worse than my skills in your native language.
The point is that you should discuss possible mistakes with the realization that the mistake might be yours. However, it might be important to clear up the issue because in doing so you both have a better understanding of the problem.
Keep in mind that people asking questions can be from all backgrounds and all skill levels. Not all scientific disciplines evolve at the same rate. I might want to use Bayesian analysis but the people around me want fractional replication with four replicates to be analyzed with LSD tests so that farmers will understand the output. Oh, some trials also use repeated measures design on top of this. You point out the problems and my response is to ask you to come and teach my boss statistics, or maybe teaching all the farmers statistics would be a more effective approach.
One start could be to ask for clarification on some parts of the question that look wrong. Maybe suggest that "this" answers the question asked but is only appropriate if "this" is true.


#9

I've run into this a bit when coworkers ask me for help. I don't think you're ever obligated to point this kind of thing out when you are just helping with code (and not involved in the analysis or managing the project). I do think not asking questions is a missed opportunity though.

I (and hopefully my coworkers as well) learn a lot by asking things like this. I usually ask people why they are doing something in a certain way (and then if they've considered x,y and z). Usually I am not thinking of some other complexities of their data (or what the client wants to pay for in some cases- weird things can happen when non-analytical people start playing telephone between two disjoint analytics teams.) Or my coworkers went with the approach I'm thinking of first but it ran into other issues so they had to get creative...

Realistically, in a lot of cases, to run any model or analysis you're making huge assumptions (and in our field there's almost always a bit of art involved because of covariance issues.) Its also better for us to have a model that's 51% right than no model at all. So we do what we can (and caveot everything)