How to extract grouped data into a list

samantha27 · June 7, 2020, 7:21pm

Hi,
I have a data frame that looks like the following:
yellow 1
yellow 0
red NA
red 0
red 0
blue 1
blue 1
blue 0
blue 0
black 0
white 0
white 0
gray 0
I need to create a list, which its characters consist of the values of in the right column, but are grouped by same values in the left column as following:
((1,0), (NA, 0,0), (1,1,0,0), 0,(0,0),0)
Could you please help

FJCC · June 7, 2020, 8:12pm

This is probably more complicated than it needs to be but it works.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DF <- data.frame(Color = c("Y", "Y", "R", "R", "R", "B", "B", "B", "B","BL", 
                           "W", "W", "G"),
                 VALUE = c(1,0,NA,0,0,1,1,0,0,0,0,0,0))
DF$Color <- factor(DF$Color, levels = c("Y", "R", "B", "BL", "W", "G")) 
DF2 <- DF %>% group_by(Color) %>% 
  summarize(TheList = list(VALUE)) %>% 
  select(TheList)
FinalList <- DF2[[1]]
FinalList
#> [[1]]
#> [1] 1 0
#> 
#> [[2]]
#> [1] NA  0  0
#> 
#> [[3]]
#> [1] 1 1 0 0
#> 
#> [[4]]
#> [1] 0
#> 
#> [[5]]
#> [1] 0 0
#> 
#> [[6]]
#> [1] 0

^{Created on 2020-06-07 by the reprex package (v0.3.0)}

samantha27 · June 7, 2020, 8:34pm

Thank you very much for your help! unfortunately , it didn't work. My final output is all the Color vector.
I'm thinking of a way to split the data, then a receive the colours separated, the problem is that the VALUE is a data frame and I'm not able to use the data. Maybe with a loop, that for each colour extracts the data from the data frame?

samantha27 · June 7, 2020, 8:36pm

in addition, please note that my actual data does not consist of colours, hence I have a lot of 'colours' which are IDs. That means that I'm not able to name each ID (more than 1000..)

FJCC · June 7, 2020, 8:57pm

Can you please post an example of your actual data? The dput command is one way to do that. If your data are in an object called DF, run the command

dput(head(DF))

and then paste the results here. Please put a line with only three back ticks, ```, before and after the output that you paste so it will be formatted as code.

samantha27 · June 7, 2020, 8:59pm

753 0
753 0
828 1
828 1
828 0
828 0
828 0
852 0
852 0
860 0

FJCC · June 7, 2020, 10:03pm

Here is how I would handle your data example. Does this work for you?

library(dplyr)

DF <- data.frame(ID = c(753, 753, 828, 828, 828, 828, 828, 852, 852, 860),
                 VALUE = c(0,0,1,1,0,0,0,0,0,0))
DF$ID <- factor(DF$ID) 
DF2 <- DF %>% group_by(ID) %>% 
  summarize(TheList = list(VALUE)) %>% 
  select(TheList)
FinalList <- DF2[[1]]
names(FinalList) <- levels(DF$ID)
FinalList
#> $`753`
#> [1] 0 0
#> 
#> $`828`
#> [1] 1 1 0 0 0
#> 
#> $`852`
#> [1] 0 0
#> 
#> $`860`
#> [1] 0

^{Created on 2020-06-07 by the reprex package (v0.2.1)}

samantha27 · June 7, 2020, 10:23pm

it worked! thank you very very much!

samantha27 · June 8, 2020, 3:58am

Following the previous question, let's say I need to use the data in the list as following:
for each ID create a matrix : [1,1] [1,0] [0,1] [0,0]
[1,1] = how many "1" values in the vector come right after another "1"
[1,0] = how many "1" values in the vector come right after "0"
[0,1] = how many "0" values in the vector come right after "1"
[0,0] = how many "0" values in the vector come right after "0"
I'm not able to extract the data from the list, I always get the ID but I would like to see for each ID a vector containing the VALUES inside.

FJCC · June 8, 2020, 4:01am

Please post the code that you have tried.

samantha27 · June 8, 2020, 4:03am

When I try to unlist(FinalList)
I get:
[1] NM_013485 NM_001034878 NM_177364 NM_009437 NM_001098810 NM_001039684
[7] NM_183034 NM_133807 NM_144824 NM_010271 NM_001364769 NM_177752
[13] NM_001364771 NM_010930 NM_172557 NM_023651 NM_001081220 NM_001364772
[19] NM_008907 NM_001039536 NM_177566 NM_009622 NM_011057 NM_199152
[25] NM_001171512 NM_008537 NM_010464 NM_033373 NR_157299 NR_157294
[31] NR_157297 NR_157298 NM_029393 NM_001364763 NM_001002764 NM_001364768
[37] NM_029537 NM_001364762 NM_153782 NM_130886 NM_133949 NM_134025
[43] NM_001364757 NM_001364767 NM_170779 NM_001289507 NM_028492 NM_134024
[49] NM_053093 NM_027869 NM_028019 NR_157089 NR_157295 NR_156747
[55] NM_001081343 NM_028466 NM_001359206 NM_015799 NM_001289511 NM_001289509
[61] NM_001277231 NM_007764 NM_027815 NM_029349 NM_173187 NM_033041
[67] NM_172524 NM_145425 NM_176921 NM_029252 NR_157296 NR_157293
[73] NM_011891 NM_001364738 NM_027427 NM_026631 NM_008928 NM_027346 ....

FJCC · June 8, 2020, 4:38am

It is very difficult to debug your problem without seeing the initial data and the code you used. Please post the result of

dput(head(DF))

but replace DF with the name of your data frame. Also post the code you used to generate FinalList.

Running unlist() on FinalList as generated from my code will simply reproduce the VALUE column of DF. Do you really want to do that?

samantha27 · June 8, 2020, 4:44am

dput(head(FinalList)) didn't work.
the code for generation of FinalList is as following:
mini.table<-data.frame(shots$game_id, shots$shot_made_flag)
mini.table[1:5,]
library(dplyr)
ID<-c(mini.table$shots.game_id)
VALUE<-c(mini.table$shots.shot_made_flag)
DF<-data.frame(ID,VALUE)
DF$ID <- factor(DF$ID)
DF2 <- DF %>% group_by(ID) %>%
summarize(TheList = list(VALUE)) %>%
select(TheList)
FinalList <- DF2[[1]]
names(FinalList) <- levels(DF$ID)

the data in the list is ordered like:

I just want to obtain one vector per ID which includes all the values inside, and then I will be able to run a function in order to create a matrix with the 4 results as I've explained above.

FJCC · June 8, 2020, 5:03am

Does this help. Of course, you do not want to print the vectors, you want to use them in a function but this code shows how to access each vector using the name of the list element.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

DF <- data.frame(ID = c(753, 753, 828, 828, 828, 828, 828, 852, 852, 860),
                 VALUE = c(0,0,1,1,0,0,0,0,0,0))
DF$ID <- factor(DF$ID) 
DF2 <- DF %>% group_by(ID) %>% 
  summarize(TheList = list(VALUE)) %>% 
  select(TheList)
FinalList <- DF2[[1]]
names(FinalList) <- levels(DF$ID)
FinalList
#> $`753`
#> [1] 0 0
#> 
#> $`828`
#> [1] 1 1 0 0 0
#> 
#> $`852`
#> [1] 0 0
#> 
#> $`860`
#> [1] 0

FinalList[["753"]] #Look at one element
#> [1] 0 0

#print all of the vectors
for (NAME in levels(DF$ID)) {
  print(NAME)
  print(FinalList[[NAME]])
}
#> [1] "753"
#> [1] 0 0
#> [1] "828"
#> [1] 1 1 0 0 0
#> [1] "852"
#> [1] 0 0
#> [1] "860"
#> [1] 0

^{Created on 2020-06-07 by the reprex package (v0.3.0)}

samantha27 · June 8, 2020, 5:06am

I get this error:
Warning message:
Unknown or uninitialised column: ID.

FJCC · June 8, 2020, 5:15am

Sorry, I am out of time for today. It seems your data frame does not have an ID column. Look at the line where you think you define that column or at earlier lines on which it depends. The command

summary(DF)

will show you which columns the data frame DF has. You can use that on any data frame to confirm it has the columns and the data types you want.

system · June 15, 2020, 5:15am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.