Choosing between this site and StackOverflow for posting a question


#42

I disagree with this (respectfully, of course), and I actually just did a little write-up on how I did/why I think I benefitted from the “association bonus” thing.

The aspect of what you’re saying that would have been very difficult for me to empathize with before spending time answering questions elsewhere, is that there’s an “abuse” (not in the violent sense of the word, in the not-using-as-intended sense) issue with a lot of drive-by poor-quality question asking (all of this is ~better described in the post I wrote). The Help Vampire thread has :+1: advice on the ways SE/SO tries to deal with this, and I by no means think it’s perfect. But, if new users had carte blanche, I don’t think the site would be the resource it is today (I have no Bayesian counterfactual, but, for all its shortcomings, the structure of SO is pretty rigorously discussed/thought through).


#43

To show why it’s not that way, I point you back to an important sentence from Frank:

Weird as it may sound, SO is not about helping you answer your question (directly, anyway). Instead, the tour tells you

With your help, we’re working together to build a library of detailed answers to every question about programming.

i.e. SO is about helping build a resource that answers your question. That explains exactly why SO is structured as it is, e.g. why asking a question is hard:

At Stack Exchange, we insist that people who ask questions put some effort into their question, and we’re kind of jerks about it.

But for good reason: we’re not-so-subtly trying to help you help yourself, by teaching you Rubber Duck problem solving.

Jeff Atwood, “Rubber Duck Problem Solving”

and answers are valued so much:

Incoming questions are a universal constant, all around us in countless billions. But answers — truly brilliant, amazing, correct answers — are as rare as pearls. Thus, questions are merely the sand that produces the pearl. If we have learned anything in the last three years, it is that you optimize for pearls, not sand.

Jeff Atwood, “Optimizing For Pearls, Not Sand”

A repository of high-quality answers is the goal because

It is probably getting difficult to imagine what a programmer’s life was like BSO (Before Stack Overflow, prior to 2008). Back when Joel Spolsky and Jeff Atwood were still programming for a living. And ran into the same problem that everybody was experiencing back then, finding help to get you unstuck to solve a programming problem was hard work back then.

You would be lucky if you found a FAQ or knowledge base article on a vendor’s site. Low odds for that after ~2000, vendors started to rely on their forums as their primary way to provide support.

If you would not be so lucky, and very common, you’d hit the paywall of a sleazy web site like expertsexchange.com. A web site that did more than any other to formulate the founders’ ideas of what a useful site should look like. They took answers from volunteers but charged a subscription fee for anybody to look at those answers.

But most commonly, you’d have to dig through hits for Usenet posts and programmer forums that touched on the same subject. But maddeningly poorly curated, you’d have to sift through hundreds of pages worth of chit-chat and people calling each other names. Often not providing an answer at all. Or resembling an answer but not in any way an accurate one, just blind guesses that you could only weigh by having to read on for the “it doesn’t work!” follow-up posts.

So Spolsky and Atwood set out to do something about it. Core ideas where a site that’s strictly Q+A, no chit-chat or discussion, just questions and answers strictly separated. And a means to get the true answer to the top efficiently by voting. And strongly avoiding a glut of duplicate questions to limit the amount of Google hits anybody has to scan. And, after a fat year, focusing only on true programming problems.

Very successful of course, SO was a strong magnet for subject experts that were pretty happy about the focus, providing excellent answers. Most programmers that asked a question could get a great answer in less than 10 minutes. It quickly overtook any other web site in Google ranking, nobody else comes close.

Hans Passant

The question that that was designed to answer and its other answers are a gold mine for understanding SO, and are well worth a deeper look:

https://meta.stackoverflow.com/questions/254770/what-is-stack-overflow-s-goal/254973

The goal of SO provides a justification for Baptiste’s point, as well:

If the goal of the community is similar to SO, that’s correct: there are no high-quality answers without capable answerers. If the goal is different—likely, or Discourse is an odd choice—experts may or may not be necessary (to be a gathering place for new useRs to commiserate is a valid goal), though it’s undeniable experts would make the community considerably more valuable.


#44

@mara, @alistaire, good answers, thank you :handshake:


#45

One under-sung option for SO rep (imho) is that, once you’ve gotten 200 points on any of the SE sites, you automatically get 100 on the others. I think answering questions on the other sites also teaches you quite a bit about helpful question-asking, and gives you empathy for both sides of the Q&A. Personally, I did this through WebApps2 and English, I can’t remember which was first, but there are a ton of SE1 sites to choose from.

I did this with the videogames one :rofl:


#46

I just saw a question on this site (which was answered) but was probably more appropriate for SO. I’d be concerned that over time the same sort of coding questions come up here again and again - something that SO can gatekeep

Probably nobody has time on their hands but what might be useful here would be a daily list/link of new questions on SO related to RStudio e.g tidyverse, RMarkdown, leaflet etc.


#47

In the intro to R4DS, @hadley and @garrett give a bit of an outline re. getting help (in the aptly-named section 1.6 Getting help and learning more).

If I were to summarise it visually (and add a dash of my poorly-worded versions of @jennybryan’s description of good reprex-ing from the rOpenSci community call) it might look something like this…

I don’t think this is a complete flow chart of how to get help. There are elements of getting help that might be outside the scope of R4DS— perusing GitHub issues, this community site (which, AFAIK, didn’t exist when the book went to print), etc., and I don’t think it was intended as an exhaustive list. However, I think it might be worth thinking about how we could flesh something like this out a bit more…or adapting it to certain contexts (e.g. @jessemaegan, there are parts of the r4ds learning community that might fit in here, but that aren’t universal).


#48

But part of what I like about this site, as a complement to SO, is … what if we don’t have to gatekeep and stress out so much re: duplicates? Maybe it’s ok that certain questions come up repeatedly because it’s always new to someone? In the limit – all questions are very elementary and repetitive – that would, of course, be bad. If that seems to be a real problem, perhaps there’s a way to tag or relocate such questions.


#49

Perhaps gatekeep wasn’t the best word. Maybe triage . Subsequently i found that this site does compare questions (just not very well)


#50

If my issue was an error and I thought it was related to a package, I’d tend to go to github issues before Stackoverflow

I tend to use the latter more when I I want to get from A to B but haven’t even coded enough to produce an errror


#51

This is a really good point, and I think it actually relates to what @pssguy is describing re. question comparisons— I like that the comparisons exist on SO, however the comparisons are only based on certain variables. And, for a beginner (or anyone not familiar with the vocabulary around the problems that come up) two similar questions might not actually seem to be the same thing.

It’s also possible that they aren’t the same thing (because of time and OS changes, or things that no longer exist, etc.), but that can be an intimidating thing to declare on SO:
"My very similar question is definitely different from all the others."
is (in my mind) a bold statement in the world of SO.


#52

I’ld like to add to this discussion that asking questions about topics that touch best practices in coding or also other programming languages (i.e. webdevelopment stuff for shiny), might result in much more emphatic answers, if people share the same background -> specialist in R, but maybe not in computer science in general.


#53

SO doesn’t really gatekeep; anybody is free to ask any question. Questions may get closed (and deleted if they are spam/useless), but SO faces a stream of questions without reprexes, duplicates, and more. Some get fixed, some get closed, and some unfortunately become tumbleweeds, which is really worse than getting closed as a duplicate, as the asker never gets an answer.

As far as this site goes,

  • When a question has a good reprex and isn’t a very common one, the community should perhaps respond, “Hey, this is a good question! You should ask it on SO,” which gives the asker a confidence bump and makes it easier for people to find the question later.

  • If the question is a very common one (e.g. one of SO’s r-faqs), linking to the canonical question on SO, e.g. for joins or long to wide form or summarizing multiple variables or making lists of data.frames can be really useful. There’s not really a point in deduplicating here, but there is a point in directing the asker to a set of solid answers. Translating the answer to their context (i.e. answering) is nice as well, but sometimes pointing askers to the right resource is actually all they really need.

  • Other questions (opinion-based, looking for tools, etc.) fit more neatly here than SO.

The SO API is good, and these could be nailed down to a particular set of parameters (say top 10 Qs and As with most votes on a given set of tags), so this wouldn’t necessarily be complicated if there’s interest. The top questions usually get answers, though, so they’re more use for reading than answering, if that’s the intent.

If bold, it’s not uncommon. For it to be useful, it has to be followed with a distinction, though. I see a lot of XY questions that make that claim but don’t illustrate a difference in their reprex, and the difference is only teased out after lots of comments (if the question even gets that far).


#54

Ok, I got curious, which led me to the SO API docs, which naturally led to

library(purrr)

top_so <- function(n = 10, 
                   tags = tidyverse::tidyverse_packages(), 
                   from = Sys.Date() - 1, 
                   to = Sys.Date()) {
    response <- httr::GET('https://api.stackexchange.com/2.2/', 
                          path = 'search',
                          query = list(pagesize = n,
                                       fromdate = as.integer(lubridate::as_datetime(from)),
                                       todate = as.integer(lubridate::as_datetime(to)),
                                       order = 'desc',
                                       sort = 'votes',
                                       tagged = paste(tags, collapse = ';'),
                                       site = 'stackoverflow'))
    content <- httr::content(response)
    content %>% 
        pluck('items') %>% 
        map(~splice(.x[-2], set_names(.x$owner, ~paste0('owner_', .x)))) %>% 
        transpose() %>% 
        modify_depth(2, ~.x %||% NA_integer_) %>% 
        simplify_all() %>% 
        tibble::as_data_frame()
}

top_tidy_qs <- top_so(from = Sys.Date() - 7)

top_tidy_qs
#> # A tibble: 10 x 19
#>          tags is_answered view_count accepted_answer_id answer_count score
#>        <list>       <lgl>      <int>              <int>        <int> <int>
#>  1 <list [3]>        TRUE        310           46438286            5     7
#>  2 <list [3]>        TRUE         67           46411437            2     5
#>  3 <list [2]>        TRUE         16           46452389            2     4
#>  4 <list [2]>        TRUE         40           46459423            2     4
#>  5 <list [4]>        TRUE         78           46404283            5     3
#>  6 <list [3]>        TRUE         63           46395144            2     3
#>  7 <list [3]>        TRUE         62           46416114            2     3
#>  8 <list [2]>        TRUE         60           46400226            1     3
#>  9 <list [3]>       FALSE         48                 NA            1     3
#> 10 <list [2]>        TRUE         38           46410963            2     3
#> # ... with 13 more variables: last_activity_date <int>,
#> #   creation_date <int>, last_edit_date <int>, question_id <int>,
#> #   link <chr>, title <chr>, owner_reputation <int>, owner_user_id <int>,
#> #   owner_user_type <chr>, owner_accept_rate <int>,
#> #   owner_profile_image <chr>, owner_display_name <chr>, owner_link <chr>

top_tidy_qs %>% dplyr::select(score, answer_count, title)
#> # A tibble: 10 x 3
#>    score answer_count                                                                                                  title
#>    <int>        <int>                                                                                                  <chr>
#>  1     7            5                                    How do I do a rolling cumsum over consecutive rows of a tibble in R
#>  2     5            2                                                                         Index of next occurring record
#>  3     4            2                                                            R cumsum with multiplication based on value
#>  4     4            2           Change axis labels when plotting a numerical vector against an &quot;as.numeric&quot; factor
#>  5     3            5                                                                              Conditional non-equi join
#>  6     3            2                                                            apply function to grouped rows in dataframe
#>  7     3            2 How to create box plots with all points where for each group, the color of the points can be assigned
#>  8     3            1                                                  Get element number of list while iterating through it
#>  9     3            1                                       How to create a ggplot 2 spaghetti plot for a 2x2x2 design in R?
#> 10     3            2                                  Replacing punctuation in string in different ways by word length in R

Warnings: This code is rough and may fail. Don’t abuse the API (Terms); if you plan to use this as more than a curiosity, register an app.


#55

Cool! Now it would be nice if I could click the links in the link column when viewing the top_tidy_qs dataframe. I guess I’ll look for the RStudio feature request thread and post it there!


#56

I’ve seen that in iTerm2 (including when running R interactively), in which you can click on any URL/file path/email address/etc. when holding CMD, but I suspect implementing it was a lot of work. For now, utils::browseURL works, e.g.

browseURL(top_tidy_qs$link[1])

#57

do you mean something like this

library(DT)
library(dplyr)

top_tidy_qs %>% 
  mutate(question = paste0("<a href=", link, ">", title, "</a>")) %>%
  select(question) %>%
  datatable(
    class = 'compact stripe hover row-border order-column',
    rownames = FALSE,
    escape = FALSE,
    options = list(
    paging = FALSE,
    searching = FALSE,
    info = FALSE
)
)

#59

Incredible! I’ll definitely be leveling up on SO pretty fast now.


#60

I knocked up a shiny app for this

Just enter any tags and you get a table of results with clickable link to individual questions


#61

Looks good, but it’d be more useful to be able to query the top-scoring questions. Actually, it’s sometimes more interesting to look at the top-scoring answers, which tend to outscore their questions significantly.


#62

Might be more useful for you. Others might want to look regularly and see latest.
I put in a nominal 100 limit
Just reduce your tags, sort on score, change the dates and you can handle your requirements - or change one line of code :grin:
I agree about the top-scoring answers being useful and I’d also like to see the answerer
(TBH that might be the most useful parameter of all)
.However, i don’t think these currently form part of API. NB any SO staff reading might help on that