how to compute frequency from two different data

I am trying to run a frequency analysis from two different data. the only common var they have is site ID. The length of the data are different. I tried to use the table statement and use the variables from both but, the numbers i get are incorrect. what is the first thing I need to do when running frequency using variables from two different data? Please I need help on this.

Here is a simulation of what you say you have; and simple counts by site id.
I'm not seeing an obvious problem to solve, but please feel free to be more precise in your question.


# start to simulate
set.seed(42)
ssize_a <- 100

(df_a <- data.frame(site_id=sample(1:3,size = ssize_a,replace=TRUE),
                    lowers=sample(letters[1:3],size=ssize_a,replace=TRUE)))

ssize_b <- 1000

(df_b <- data.frame(site_id=sample(1:3,size = ssize_b,replace=TRUE),
                    uppers=sample(LETTERS[1:3],size=ssize_b,replace=TRUE)))

# end simulation

# start frequency analaysis ..
# first one : 
table("site" = df_a$site_id,df_a$lowers)
# second one : 
table("site" = df_b$site_id,df_b$uppers)

The two data i am using torun frequency analysis are very big. with both having a number of variables. Below is an example of frequency analysis i did using variable from each data but, the result I got gave me incorrect numbers.

table (gold_main$lang_english, gold_der$screening_base, exclude=NULL)

     0      1

0 2915 235846
1 5784 250267

Oh, I see , what you are asking would seem to make little sense because there is no relationship by which to say whether the different entries go together ...
I thought you said you had site id ?

extending my previous code :


df_joined <- dplyr::full_join(df_a,
                 df_b,
                 multiple="all")

table(lowers=df_joined$lowers,
      uppers=df_joined$uppers)

Sorry for the confusion. The only common Variable is the site id. however the number of vars and the size of the data is different between the two variables. So, pretty much i am taking one var from one of the data and another var from the other data and just kind of do a crosstabulation

yes; so either you get the independent volumes as I showed in my first post; or to get co-related volumes based on matching on site id; you would have to do a proper join like I showed an example of in my second post.

Good luck.

Thank you so much!
quick question- what is the ssize_a <- 100 and ssize_b<-1000 mean?

It was to simulate data. How many records to simulate; You can see that it was used repeatedly in the sample() functions.
its irrelevant to you, aside from being able to reproduce the df_a,df_b I had to work with for examples.

Would you please use the lang_english from gold_main data and screening_base from gold_der data as an example in your example? Because I am still confused.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.