Ranking numeric data for each individual in a group year.

ringtailedlemur · May 30, 2022, 4:28pm

Hello R community,

I've calculated some DCSI scores, and i'm interested in the top three scores an individual has for a given year. I'm having trouble with ranking them. When I calculate ranks on the DCSI scores, its doing so for the entire data set, whereas I just want it to rank DSCI scores each individual separately. Here is my code:

Calculate top 3 partners

#Split grooming data by groupyear
groom.prox.DCSI %<>%
mutate(unique.id = paste(groupyear, monkey.id, sep="_"))

DCSI.split <- split(groom.prox.DCSI, list(groom.prox.DCSI$unique.id))

calc_ranks <- function(i){
print(i)

i %<>%
mutate(partner.rank = rank(DCSI))

i %<>%
filter(partner.rank == "1",
partner.rank == "2",
partner.rank == "3")

Can anyone see where i'm going wrong?

Dissipation123 · May 30, 2022, 5:22pm

Hello!

I'm not sure what a DSCI score is used for, but let's assume an individual receives several of them in a year.

Splitting the dataset for creating group ranks may not be the right approach here. Assuming you have the library dplyr loaded (since you are using mutate()), you can create a group_by statement for year and individual which would supersede the need to create a unique monkey.id_groupyear concatenation.

#maybe something like this
df <- 
groom.prox.DSCI %>%
group_by(monkey.id, groupyear) %>%
mutate
(
     partner.rank = rank(DSCI, ties.method = 'average') #or which ever tie method you prefer
) %>% 
ungroup()

There is an old stackover flow thread on this that could provide more context if this isn't what you need: How to rank within groups in R? - Stack Overflow

ringtailedlemur · May 31, 2022, 1:12pm

Thanks so much for your reply.
A DCSI score is a social relationship measure used in primates, its a composite score between 0 and 1 which combines grooming and proximity. So effectively a measure of relationship strength.
In my study, individuals have multiple social relationships in the group they live, and these will also vary year to year.
I ran the code you suggested and it came up with the following error message:
Error in UseMethod("ungroup") :
no applicable method for 'ungroup' applied to an object of class "c('double', 'numeric')"

I was using the package plyr, but switched to dplyr and still got the same error message. In case it wasn't clear, groom.prox.DCSI is the name of my data set.
Thanks for your help! I was looking at the link you sent, but don't think I can do it that way. I.e, I have a very large data set, with different group sizes and different numbers in each year, so I couldn't use something like this " local data frame [12 x 4]" where you need to specify row numbers, as these vary in different groups.

Dissipation123 · June 2, 2022, 1:22am

Hmm that's strange that the ungroup() is creating that error. You could try to run the code without the:
" %>%
ungroup()"

as that just ungroups the group_by statement in case you want to run other things on the dataset. Group sizes and numbers shouldn't change how the groupings work.

Could you use dput() to get the code to reproduce the first 10-20 rows of the dataset? I'm having a hard time conceptualizing the dataset unfortunately. r - Example of using dput() - Stack Overflow

ringtailedlemur · June 2, 2022, 11:27am

Thanks for your help. I have used the dput as you suggested and will post the code below. One thing I should note is that there are many different monkey id's. But the first 20 lines only shows you information for one individual "00J". It's a very large dataset and for each individual i've calculated social relationship strengths with all members of the group. What I want to do is rank those. Those with the highest DCSI's are the strongest relationships. So I want to rank them to see which individuals a focal monkey has its strongest relationships with, then eventually extract the top 3 partners in terms of highest DCSI score. (I should also note there are different groups, this monkey just happens to be in group F. And a monkey only has relationships with members in its group).

df <- structure(list(groupyear = c("F2010", "F2010", "F2010", "F2010",
"F2010", "F2010", "F2010", "F2010", "F2010", "F2010", "F2010",
"F2010", "F2010", "F2010", "F2010", "F2010", "F2010", "F2010",
"F2010", "F2010"), monkey.id = c("00J", "00J", "00J", "00J",
"00J", "00J", "00J", "00J", "00J", "00J", "00J", "00J", "00J",
"00J", "00J", "00J", "00J", "00J", "00J", "00J"), partner.id = c("00O",
"03J", "04N", "10S", "14I", "14L", "17C", "20F", "24B", "28J",
"28N", "29B", "29Z", "32S", "34N", "35N", "36Z", "40L", "42L",
"43F"), data.year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L), current.group = c("F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F"), DCSI = c(0, 1.27636402820631, 0, 1.40971549383981,
0, 0.755607504698138, 0, 0, 0, 0, 0.646924233474433, 0, 0, 0.638182014103157,
0, 0.638182014103157, 9.14323496669351, 0.694492191818142, 0,
0)), row.names = c(NA, 20L), class = "data.frame")

Really appreciate your help.

Dissipation123 · June 3, 2022, 3:15am

Thanks for providing the dput(). It makes things a bit clearer.

Running the following code on the dataset (df) you provided narrows down each individuals top 3 DCSI scores for each year. One thing to keep in mind is that I have ties.method = "min" set for rank, this means technically you could filter more than 3 partner rankings if there are ties.

Dplyr has other ranking options as well (check out the dense_rank() function for example: cheatsheets/data-transformation.pdf at main · rstudio/cheatsheets · GitHub


library(dplyr)

df2 <- 
df %>%
group_by(monkey.id, groupyear) %>% #criteria for how you'd like to group rankings
  mutate(
     partner.rank = order(order(rank(DCSI, ties.method = "min"),decreasing = TRUE)) 
         ) %>% 
  filter(partner.rank < 4) #keeps the top 3 results - be mindful of ties. 
# Change this filter option or remove if you want the whole dataset
# %>% ungroup() if you want to make other aggregations

`DPUT() below`

df2 <-
structure(list(groupyear = c("F2010", "F2010", "F2010"), monkey.id = c("00J",
"00J", "00J"), partner.id = c("03J", "10S", "36Z"), data.year = c(2010L,
2010L, 2010L), current.group = c("F", "F", "F"), DCSI = c(1.27636402820631,
1.40971549383981, 9.14323496669351), partner.rank = 3:1), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), groups = structure(list(
monkey.id = "00J", groupyear = "F2010", .rows = structure(list(
1:3), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))

ringtailedlemur · June 3, 2022, 11:58am

Thanks so much for your help, really appreciate it! That worked really nicely.
Sorry to keep asking for help, but I have one final thing i'd like to do with the data and can't figure out how to do it, would you be able to help?
So I ran the code, and it created a new dataset with an individuals top 3 partners and their ranks (Thank you). I then created a binary variable, to code if any of their top 3 partners are non-kin, i.e 0 for kin and 1 for non-kin.
What I now want to do is know how many of of their top 3 partners are non-kin. So do a count, for each individual in each year, how many of their top 3 partners were non-kin. But I can't figure out how to do it.
This is the dput code:
df3 <- structure(list(groupyear = c("F2010", "F2010", "F2010", "F2010",
"F2010", "F2010", "F2010", "F2010", "F2010", "F2010", "F2010",
"F2010", "F2010", "F2010", "F2010", "F2010", "F2010", "F2010",
"F2010", "F2010"), monkey.id = c("00J", "00J", "00J", "00O",
"00O", "00O", "03J", "03J", "03J", "04N", "04N", "04N", "10S",
"10S", "10S", "14I", "14I", "14I", "14L", "14L"), partner.id = c("36Z",
"62V", "63V", "44J", "55V", "90D", "61J", "62V", "00J", "17C",
"24B", "98T", "14L", "64P", "68V", "29Z", "78T", "V36", "17C",
"68V"), data.year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L), current.group = c("F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F"), DCSI = c(9.14323496669351, 5.24727433818152,
15.0677748544049, 13.3320473141112, 12.270117025579, 71.2592375296376,
15.6726142511153, 3.99126543701154, 1.27636402820631, 24.5036860857084,
7.72561755609198, 16.2382372788447, 18.5164970827263, 17.3079133497021,
15.9792558738643, 13.8380007377055, 40.1022560899402, 15.9123236938435,
14.3415425751191, 16.1986973817819), kinship = c("kin", "kin",
"nonkin", "nonkin", "nonkin", "kin", "kin", "nonkin", "nonkin",
"kin", "nonkin", "kin", "kin", "nonkin", "nonkin", "nonkin",
"kin", "nonkin", "nonkin", "nonkin"), partner.rank = c(2L, 3L,
1L, 2L, 3L, 1L, 1L, 2L, 3L, 1L, 3L, 2L, 1L, 2L, 3L, 3L, 1L, 2L,
3L, 2L), nonkinpresence = c("0", "0", "1", "1", "1", "0", "0",
"1", "1", "0", "1", "0", "0", "1", "1", "1", "0", "1", "1", "1"
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-20L), groups = structure(list(monkey.id = c("00J", "00O", "03J",
"04N", "10S", "14I", "14L"), groupyear = c("F2010", "F2010",
"F2010", "F2010", "F2010", "F2010", "F2010"), .rows = structure(list(
1:3, 4:6, 7:9, 10:12, 13:15, 16:18, 19:20), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .drop = TRUE))

Do you know how I might do this?

(edit: Have just realised that I didn't need to create the binary variable "nonkinpresence", as "kinship" is exactly the same thing).

Dissipation123 · June 5, 2022, 7:52pm

Hello again,

Finding out to do this was little trickier than expected as dplyr apparently doesn't have conditional counts built in, but the solution is nothing too fancy really. The summarize() function can handle this no problem however as long as you create a custom sum() when kinship is equal to kin.

It seems like a lot of your work can be done by using dplyr functions so you might want to check out: Introduction to dplyr • dplyr or google dplyr tutorials.

library(dplyr)

summary_df <-
  df3 %>% 
  group_by(monkey.id, groupyear) %>% 
  summarize(kin_count = sum(kinship == 'kin')) #double equal sign required for strings

#alternative approach with binary variable
summary_df_alternative <-
  df3 %>% 
  group_by(monkey.id, groupyear) %>% 
  summarize(kin_count = sum(nonkinpresence = 0)) #or != 1

After you can merge this number to your initial dataset. The summarized count will be duplicated for each unique monkey.id and year.

Final_df <-
  merge (df3,summary_df, by = c('monkey.id','groupyear'))

ringtailedlemur · June 6, 2022, 10:28am

Thanks so much for your help, you've been brilliant.

system · June 16, 2022, 6:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.