Hi, I have a large dataset with the weight of each observation of a survey. I´d like to use these weights to manipulate the data properly and then create a histogram or a density plot.
I know I have to use the svydesign () function, but I am not sure of the paramaters I must use.
I have:
My_data_frame
The column weights (with values between 2.2 and 8.2 for the 20.000 rows),
Using svydesign from the package survey does more than incorporate weights, it also incorporates the sampling design. Generally in the survey data documentation, you can find out what the sampling design was and how to estimate variances using the PSUs, strata, or replicate weights. Those are part of what goes into the svydesign function.
Thanks, I've been checking the guide.
The problem I found is that all the exercises and examples are made to design a representative sample an then use it. I need a different approach, I have the sample designed with their respective weights. In this case they are statistics of the income tax. And I want to use functions like svytable, svyplot for my analisys.
I think it should be simple, but the only thing I know about my sample is the weight vector and that the sample size is calculated for an error, in the mean of the income variable, less than 3% with a confidence level of 3 per thousand.
I am not sure how to proceed to tell R the survey design in this case
What survey is this? I'm not sure I understand your statement "sample size is calculated for an error, in the mean of the income variable, less than 3% with a confidence level of 3 per thousand" You could assume it is a simple random sample if you really don't know anything else. Note the survey design only impacts variances, standard errors, and test statistics, it doesn't impact point estimates like means and totals. Here's an example using the srvyr/survey package:
library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#>
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#>
#> dotchart
library(tidyverse)
library(srvyr)
#>
#> Attaching package: 'srvyr'
#> The following object is masked from 'package:stats':
#>
#> filter
# In this example, the data apiclus1 has a weight of pw
# We will pretend it is from a simple random sample
# https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html
data(api) # this loads in some example data from survey package
my_des <- apisrs %>%
as_survey_design(ids=1, #no cluster
weights=pw
)
# Making a frequency table using a survey object
my_des %>%
group_by(awards) %>%
summarize(proportion=survey_mean(),
total=survey_total())
#> # A tibble: 2 x 5
#> awards proportion proportion_se total total_se
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 No 0.38 0.0344 2354. 213.
#> 2 Yes 0.62 0.0344 3840. 213.
svyplot(api00~api99, design=my_des)
Thanks for your help I was I bit confused regarding the idea of how to use the weights of my data because I didn't have more info about the sample but I think this was the key:
" You could assume it is a simple random sample if you really don't know anything else."