Dear R-studio community.
I am trying to create a subset of some data, but given the nature of the data i need certain conditions to be met.
The problem is that each of my rows contain a single payment, this payment has a variable specifying a contact number. For certain customers there are multiple payments which will fall in different rows, but they will be labelled with the same contact number.
Therefore, i need the subset to take into consideration that if it selects one payment from one contact number, it needs to include all other rows (payments) containing that contact number.
See an example of customer 18445 below. I need a subset to include all 9 payments that have been made under this contact number, if it is randomly selected.
Hi, woodward thank you for your help, but unfortunately this is not what I mean.
I need a random selection of around 3000 data entries from a dataset containing 52000 data entries. Some contact numbers are repeated several times, so if the random selection chooses one row containing this contact number it needs to select all rows containing that contact number. The subset of the data will as such end up containing multiple different contact numbers and not just 18445 (this was just an example to show that the contact numbers are sometimes repeated.)
Do you think there is a solution for this?
Many Thanks,
Naja
Do you need exactly 3000 rows? This will make it rather difficult because you need to "randomly" choose contact numbers whose rows equal 3000 exactly. Which maky not even be possible.
What do you mean by random? If you take all the rows that match a contact number this will bias the sample. Or can you randomly sample the contact numbers.
The easiest is to make a list of contact numbers, and choose one randomly until you have at least 3000 rows. But this might not be what you want.
Hi woodward,
No it does not need to be 3000 exactly, i just need it to sort according to contact number as i will use the sample to do a rentention rate and therefore need all payments made under each contact number that is selected.
It would be good to randomly sample the contact numbers and then get a subset from that
Naja