Issue with dput()

rstudio

#1

Hello,

I have read many posts about reproducing a sample data using dput(). However, the out-put does not have something like "Structure....".
It seems to me since I have over 50,000 rows with 10 columns, so dput() cannot contain all of the information in the output.
Even when I do:

dput(data[1:10,])

The out-put still does not have "Structure..."

Members from R-community suggest posts should have reproducible samples, but since I cannot reproduce a sample using dput(), what can I do in order to post an acceptable post with all needed information so the question can be addressed?

Thank you.


#2

I agree that a very large dataset is not a good fit for the dput() strategy (some people will argue that there are very few problems where you really need to include all of a large dataset in your reproducible example). There have been a couple of discussions here with ideas for sharing data beyond dput():

I’m curious what’s going wrong for you when you try dput() while selecting just a few rows. You said you don’t get output that starts with structure() — what do you get? What happens when you try running dput() on a slice of a built-in dataset? For example:

dput(head(ggplot2::diamonds))
(I get this...)
structure(list(carat = c(0.23, 0.21, 0.23, 0.29, 0.31, 0.24), 
    cut = structure(c(5L, 4L, 2L, 4L, 2L, 3L), .Label = c("Fair", 
    "Good", "Very Good", "Premium", "Ideal"), class = c("ordered", 
    "factor")), color = structure(c(2L, 2L, 2L, 6L, 7L, 7L), .Label = c("D", 
    "E", "F", "G", "H", "I", "J"), class = c("ordered", "factor"
    )), clarity = structure(c(2L, 3L, 5L, 4L, 2L, 6L), .Label = c("I1", 
    "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"), class = c("ordered", 
    "factor")), depth = c(61.5, 59.8, 56.9, 62.4, 63.3, 62.8), 
    table = c(55, 61, 65, 58, 58, 57), price = c(326L, 326L, 
    327L, 334L, 335L, 336L), x = c(3.95, 3.89, 4.05, 4.2, 4.34, 
    3.94), y = c(3.98, 3.84, 4.07, 4.23, 4.35, 3.96), z = c(2.43, 
    2.31, 2.31, 2.63, 2.75, 2.48)), .Names = c("carat", "cut", 
"color", "clarity", "depth", "table", "price", "x", "y", "z"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

#3

@jcblum
What I get is the a ton of lines and I will scroll all the way up and then I cannot scroll any more. The output ends right there.

I did tailor down the columns. Otherwise, it will be a mess.
I assume the right syntax for select a few rows is posted in my original post above. Am I right?
If not, I don’t know what to do.


#4

Yes, the syntax you used should select the first 10 rows and all columns — so a 10 row x 10 column data frame. That shouldn’t be an inordinately long amount of text, unless there’s some really long values in your data frame?

But either way, if the output is too long for your console's scrollback buffer, you can use sink() to send all the console output to a file instead. For instance:

sink("dput_diamonds.txt")  # output to specified file in working directory
dput(ggplot2::diamonds)
sink()  # cancels sink, output to console again

(the result is a 2.7MB text file :astonished:... diamonds has >50,000 rows)


#5

@jcblum
Hello,
I was able to extract the output to *.txt file.

Thank you for your input.
I appreciate it.


#6

Fantastic! Happy to help :grin:


#7

For this sort of task I also suggest trying wrapr::draw_frame().

cat(wrapr::draw_frame(head(ggplot2::diamonds)))

wrapr::build_frame(
   "carat", "cut"      , "color", "clarity", "depth", "table", "price", "x" , "y" , "z"  |
   0.23   , "Ideal"    , "E"    , "SI2"    , 61.5   , 55     , 326L   , 3.95, 3.98, 2.43 |
   0.21   , "Premium"  , "E"    , "SI1"    , 59.8   , 61     , 326L   , 3.89, 3.84, 2.31 |
   0.23   , "Good"     , "E"    , "VS1"    , 56.9   , 65     , 327L   , 4.05, 4.07, 2.31 |
   0.29   , "Premium"  , "I"    , "VS2"    , 62.4   , 58     , 334L   , 4.2 , 4.23, 2.63 |
   0.31   , "Good"     , "J"    , "SI2"    , 63.3   , 58     , 335L   , 4.34, 4.35, 2.75 |
   0.24   , "Very Good", "J"    , "VVS2"   , 62.8   , 57     , 336L   , 3.94, 3.96, 2.48 )

#8

Ooh, very nice! Can you post that to the FAQ thread about ways of including your data (ideally with brief, clear instructions for basic use in this context and the best way to get the package, since we send a lot of inexperienced useRs to that thread).


Best Practices: how to prepare your own data for use in a `reprex` if you can’t, or don’t know how to reproduce a problem with a built-in dataset?
FAQ: What's a reproducible example (`reprex`) and how do I do one?
#9

Wow, thanks. I will add it to the FAQ in a bit!