How to print a line by the number of occurrences listed in the dataset?

SterlingWright2016 · March 14, 2019, 8:42pm

Hello,

I have this dataset:

Length Occurrence Location
30 25 New York
60 40 New York
75 20 Philadelphia

I want to have each row print the number of times according to the occurrence. That is, I want the line with length 30 and New York to print 25 times; and 75 Philadelphia to print 20 times. I was trying to make a double for loop but I have not been able to get the answer yet.

Thank you.

andresrcs · March 14, 2019, 9:14pm

Im not sure what you mean by "print" but I think this is what you want

df <- data.frame(stringsAsFactors = FALSE,
                 Length = c(30L, 60L, 75L),
                 Occurrence = c(25L, 40L, 20L),
                 Location = c("New York", "New York", "Philadelphia")
)
library(purrr)

map2_dfr(1:dim(df)[2], df$Occurrence, ~df[rep(.x, each=.y),])
#>    Length Occurrence     Location
#> 1      30         25     New York
#> 2      30         25     New York
#> 3      30         25     New York
#> 4      30         25     New York
#> 5      30         25     New York
#> 6      30         25     New York
#> 7      30         25     New York
#> 8      30         25     New York
#> 9      30         25     New York
#> 10     30         25     New York
#> 11     30         25     New York
#> 12     30         25     New York
#> 13     30         25     New York
#> 14     30         25     New York
#> 15     30         25     New York
#> 16     30         25     New York
#> 17     30         25     New York
#> 18     30         25     New York
#> 19     30         25     New York
#> 20     30         25     New York
#> 21     30         25     New York
#> 22     30         25     New York
#> 23     30         25     New York
#> 24     30         25     New York
#> 25     30         25     New York
#> 26     60         40     New York
#> 27     60         40     New York
#> 28     60         40     New York
#> 29     60         40     New York
#> 30     60         40     New York
#> 31     60         40     New York
#> 32     60         40     New York
#> 33     60         40     New York
#> 34     60         40     New York
#> 35     60         40     New York
#> 36     60         40     New York
#> 37     60         40     New York
#> 38     60         40     New York
#> 39     60         40     New York
#> 40     60         40     New York
#> 41     60         40     New York
#> 42     60         40     New York
#> 43     60         40     New York
#> 44     60         40     New York
#> 45     60         40     New York
#> 46     60         40     New York
#> 47     60         40     New York
#> 48     60         40     New York
#> 49     60         40     New York
#> 50     60         40     New York
#> 51     60         40     New York
#> 52     60         40     New York
#> 53     60         40     New York
#> 54     60         40     New York
#> 55     60         40     New York
#> 56     60         40     New York
#> 57     60         40     New York
#> 58     60         40     New York
#> 59     60         40     New York
#> 60     60         40     New York
#> 61     60         40     New York
#> 62     60         40     New York
#> 63     60         40     New York
#> 64     60         40     New York
#> 65     60         40     New York
#> 66     75         20 Philadelphia
#> 67     75         20 Philadelphia
#> 68     75         20 Philadelphia
#> 69     75         20 Philadelphia
#> 70     75         20 Philadelphia
#> 71     75         20 Philadelphia
#> 72     75         20 Philadelphia
#> 73     75         20 Philadelphia
#> 74     75         20 Philadelphia
#> 75     75         20 Philadelphia
#> 76     75         20 Philadelphia
#> 77     75         20 Philadelphia
#> 78     75         20 Philadelphia
#> 79     75         20 Philadelphia
#> 80     75         20 Philadelphia
#> 81     75         20 Philadelphia
#> 82     75         20 Philadelphia
#> 83     75         20 Philadelphia
#> 84     75         20 Philadelphia
#> 85     75         20 Philadelphia

^{Created on 2019-03-14 by the reprex package (v0.2.1)}

reddyr · March 15, 2019, 2:13am

Just sharing what I know. There is a simple function uncount() to do this in tidyr:

library(tidyverse)
df <- data.frame(stringsAsFactors = FALSE,
                 Length = c(30L, 60L, 75L),
                 Occurrence = c(25L, 40L, 20L),
                 Location = c("New York", "New York", "Philadelphia")
)
df %>% 
    uncount(Occurrence, .remove = FALSE)
#>      Length Occurrence     Location
#> 1        30         25     New York
#> 1.1      30         25     New York
#> 1.2      30         25     New York
#> 1.3      30         25     New York
#> 1.4      30         25     New York
#> 1.5      30         25     New York
#> 1.6      30         25     New York
#> 1.7      30         25     New York
#> 1.8      30         25     New York
#> 1.9      30         25     New York
#> 1.10     30         25     New York
#> 1.11     30         25     New York
#> 1.12     30         25     New York
#> 1.13     30         25     New York
#> 1.14     30         25     New York
#> 1.15     30         25     New York
#> 1.16     30         25     New York
#> 1.17     30         25     New York
#> 1.18     30         25     New York
#> 1.19     30         25     New York
#> 1.20     30         25     New York
#> 1.21     30         25     New York
#> 1.22     30         25     New York
#> 1.23     30         25     New York
#> 1.24     30         25     New York
#> 2        60         40     New York
#> 2.1      60         40     New York
#> 2.2      60         40     New York
#> 2.3      60         40     New York
#> 2.4      60         40     New York
#> 2.5      60         40     New York
#> 2.6      60         40     New York
#> 2.7      60         40     New York
#> 2.8      60         40     New York
#> 2.9      60         40     New York
#> 2.10     60         40     New York
#> 2.11     60         40     New York
#> 2.12     60         40     New York
#> 2.13     60         40     New York
#> 2.14     60         40     New York
#> 2.15     60         40     New York
#> 2.16     60         40     New York
#> 2.17     60         40     New York
#> 2.18     60         40     New York
#> 2.19     60         40     New York
#> 2.20     60         40     New York
#> 2.21     60         40     New York
#> 2.22     60         40     New York
#> 2.23     60         40     New York
#> 2.24     60         40     New York
#> 2.25     60         40     New York
#> 2.26     60         40     New York
#> 2.27     60         40     New York
#> 2.28     60         40     New York
#> 2.29     60         40     New York
#> 2.30     60         40     New York
#> 2.31     60         40     New York
#> 2.32     60         40     New York
#> 2.33     60         40     New York
#> 2.34     60         40     New York
#> 2.35     60         40     New York
#> 2.36     60         40     New York
#> 2.37     60         40     New York
#> 2.38     60         40     New York
#> 2.39     60         40     New York
#> 3        75         20 Philadelphia
#> 3.1      75         20 Philadelphia
#> 3.2      75         20 Philadelphia
#> 3.3      75         20 Philadelphia
#> 3.4      75         20 Philadelphia
#> 3.5      75         20 Philadelphia
#> 3.6      75         20 Philadelphia
#> 3.7      75         20 Philadelphia
#> 3.8      75         20 Philadelphia
#> 3.9      75         20 Philadelphia
#> 3.10     75         20 Philadelphia
#> 3.11     75         20 Philadelphia
#> 3.12     75         20 Philadelphia
#> 3.13     75         20 Philadelphia
#> 3.14     75         20 Philadelphia
#> 3.15     75         20 Philadelphia
#> 3.16     75         20 Philadelphia
#> 3.17     75         20 Philadelphia
#> 3.18     75         20 Philadelphia
#> 3.19     75         20 Philadelphia

^{Created on 2019-03-14 by the reprex package (v0.2.1)}

SterlingWright2016 · March 18, 2019, 7:45pm

Thank you both for the help.

However, I have not been able to get either method to work. Here are the errors that I have been getting.

map2_df(1:dim(TopTENtemp)[2], TopTENtemp$Occurences, ~TopTENtemp[rep(.x, each =.y),])
Error: Mapped vectors must have consistent lengths:

.x has length 4
.y has length 6

TopTENtemp %*%

uncount(TopTENtemp$Occurences, .remove = FALSE)
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "c('integer', 'numeric')"

system · April 8, 2019, 7:45pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.