Im stuck with my dataanalysis i R. I hope you can help. I am analyzing the impact of migrating from rural to urban areas in Tanzania and want to make a dietary diversity score for rural households before they migrate and after they migrate. I divided every fooditem into 10 groups. If the household consume 5 or more groups they have dietary diversity.
Rice is in group 1, cassava is in group 1, sweet potatoes is in group 8 and so on. How do I do if I want to tell R to give household#1 a score of 1 if they consumed food from 5 or more groups and a 0 if not. Remember that plenty of foods are in under the same foodgroup but I am only interested in the intake of different foodgroups. So the household should not get 1 two times if they consumed food from the same group twice.
To give an example of how my data looks like, see below. There is 59 different fooditems divided in 10 foodgroups. E.g. household#1 has 59 rows. Thereafter comes household#2 and so on downwards. There is 281 households.
How do I count the mean dietary diversity score (if the household consumes 5 or more foodgroups) from all of the household together and seperately? I would like to make a plot with dietary diversity score on the y-axis (from 1-10) and on the x-axis the different household (hh1rural, hh1urban;hh2rural, hh2urban; hh3rural, hh3urban...) and one plot with the mean dietary diversity score on the y-axis and on the x-axis rural households and urban households.
I hope you understand my question. Thank you so much in advance.
Here is how I would do it, using a small data set. I wrote a small function that filters each family's data for Consume == "Yes" and then counts the number of unique Group values.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#Invent data
set.seed(123)
DF <- data.frame(ID = rep(1:3, each = 5),
Item = sample(101:159, 5),
Group = sample(1:10, 15, replace = TRUE),
Consume = sample(c("No", "Yes"), 15, replace = TRUE))
DF
#> ID Item Group Consume
#> 1 1 131 10 Yes
#> 2 1 115 2 No
#> 3 1 151 6 Yes
#> 4 1 114 5 No
#> 5 1 103 4 Yes
#> 6 2 131 6 No
#> 7 2 115 9 Yes
#> 8 2 151 10 Yes
#> 9 2 114 5 No
#> 10 2 103 3 No
#> 11 3 131 9 No
#> 12 3 115 9 No
#> 13 3 151 9 Yes
#> 14 3 114 3 No
#> 15 3 103 8 Yes
#Make a function to count the unique Groups
DiversityFunc <- function(G, C) {
G <- G[C == "Yes"]
length(unique(G))
}
DiversityDF <- DF %>% group_by(ID) %>%
summarize(GroupCount = DiversityFunc(Group, Consume))
#> `summarise()` ungrouping output (override with `.groups` argument)
I tried to run the code now @FJCC. It did what I told it to do, I tried to interpreted your code into my own R with the exact number of colums, rows etc. Something went wrong during the process. The thing is that the fooditems not are in order, so it will look like this: 101, 102, 203, 204, 205, 300, 301 and so on. So when I run the code it is just chosing randomly numbers it seems like. It looks like this:
So the fooditems "Værdi" is not the actual ones and also I can see that the intake (hh_j01) gives a one twice to foodgroupvalue "Fødevaregruppeværdi" no 6. I want to tell R only to count 1 one time in the same foodgroup and not twice.
Data as it is now looks like this with all the important colums
y3_hhid: ID, Værdi=fooditemvalue, Fødevare=food, Fødevaregruppe=foodgroup, Fødevaregruppeværdi=foodgroupvalue, hh_j01=1=yes, 2=no (consume)
The resulting data frame DiversityDF should have one row for each value of y3_hhid and its GroupCount column will show how many unique values of Fødevaregruppeværdi had hh_j01 == 1.
If that does not work, please make a small data set with two values of y3_hhid with each of those having a few different foods. Ten rows of data is enough. Let's say you name that small data set DF. Post the output of
dput(DF)
and place a line with three back ticks just before and after the output, like this
```
your output here
```
We can then both work with the same data set.
@FJCC, do you have any idea how to count how many "1" (yes) there is in each of the 10 foodgroups for all the households? The foodintake in each foodgroup still needs to be unique.
What a would like is a simple table looking like this.
Foodgroup n intake
1 201
2 99
3 137
So the n intake will be showing how many of the 281 household that did consume food from the specifik foodgroup. Then after I will calculate the percentage.
CountHousehold <- DF %>% filter(hh_j01 == 1) %>% #keep only the "Yes" rows
select(y3_hhid, Fødevaregruppeværdi) %>% #keep only these two columns
distinct() %>% #Remove duplicates
count(Fødevaregruppeværdi) #Count how many times each Fødevaregruppeværdi appears
That was what I was looking for thank you so much!
I noticed that there are some households with dietary diversity=0 which is unlikely. I have identified those households by subsetting. Now I want to delete those from my initial data. I tried all the tips from google nothing works.
I have a dataset "subsetmhdds0" with my 11 households that needs to be removed from another dataset. The IDs are in both datasets called "hh_a09". Do you have any suggestions how to remove these households? The IDs are listet as exponents eg 8.060120e+12 which may be a problem, because there's multiple IDs looking like that.
It is hard to be specific without seeing either your code or your data. Starting with the second problem, your IDs have been read as numbers. If they are numbers in both data sets, that is not a problem. If one data set has numbers and the other characters, you can convert the numbers to characters with code like