Help with subset

Hi! I am new to R... and trying to learn how to subset my data for a survey. So if I have a question, Q11 in my dataset... and people can select which type of eduction they have had... I am trying to collapse the categories from 3 options to just an overall category for example "Masters". Because I have a lot of zeros, as I am trying to also compare this across a group of 5 different categories, for example, which education for group A, B, C, D, E which is Q15 in my dataset.

Is there anywhere you can go to hire someone to teach you how to use R???

Hi, I'm sure you could hire someone if you wanted to.

However, you could also provide some more information and we might be able to help you work it out yourself. A reproducible example and the code that you have tried would be useful.

1 Like

On learning resources, see The Big Book of R. A good introductory course is R Basics. For tutoring, there's a jobs board here (see this example).

Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

Here's an illustration of a problem similar to yours, except with rows, rather than columns.

# x is a large data frame composed of integers
# 
# simulated data to substitute for the actual x
set.seed(42)
(DF <- rbind(sample(-20:400,20),sample(-20:400,20),sample(-20:400,20),sample(-20:400,20),sample(-20:400,20)))
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]   28  300  132   53  207  125  101  400  107   282     3   306   335    68
#> [2,]  276   68  262   88  -16  191  327  339  238   293   277     3   137   278
#> [3,]  125   88  327  176  -17  205  334  194  224   389    93   241   369   109
#> [4,]  117   19  -16   12   82  207   88  308  136    55   244    14   200    -5
#> [5,]   61  387  348  381  304   89  339  275  128    36    79   277   397    70
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]   144    89    -1   349   346   366
#> [2,]   378   385   391   115   271   303
#> [3,]   351   -18   353   237   337   165
#> [4,]   336   199   227   304    97   109
#> [5,]   248   160    33   318   267   187

# Substitute a row with no values outside the range

DF[3,] <- 1:20

# y is a smaller data frame composed of integers lies outside the range 1:365, so let's create an object to represent that range

boring <- 1:365

# Every row of y is also a row of x, so y is a subset of x. If, and only if,
# there is no integer in x outside a specified range of integers is identical(x,y) equal to TRUE. The objective is to find all rows that have one or more integers outside the specified range of the interval.
# 
# f is a function that needs to be composed, so let's start

# determine whether a single integer is outside the specified range

find_outsider <- function(x) !(x %in% boring)

# example

find_outsider(400)
#> [1] TRUE

# determine which integers in a single row are outside the range

find_outsider(DF[1,])
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE

# but we're only really interested if there is at least one
sum(find_outsider(DF[1,])) > 0
#> [1] TRUE

# let's glue this together

pick_rows <- function(x,y) sum(find_outsider(x[y,])) > 0

# create a vector to hold results

hits <- vector()

# loop over rows

for(i in 1:nrow(DF)) hits[i] = pick_rows(DF,i)

# use the hits logical vector to subset DF


DF[hits,]
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]   28  300  132   53  207  125  101  400  107   282     3   306   335    68
#> [2,]  276   68  262   88  -16  191  327  339  238   293   277     3   137   278
#> [3,]  117   19  -16   12   82  207   88  308  136    55   244    14   200    -5
#> [4,]   61  387  348  381  304   89  339  275  128    36    79   277   397    70
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]   144    89    -1   349   346   366
#> [2,]   378   385   391   115   271   303
#> [3,]   336   199   227   304    97   109
#> [4,]   248   160    33   318   267   187

This makes use of the subset operator [. which allows easy selection of rows, columns, both rows and columns.

mtcars[1:2,]
#>               mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4
mtcars[,3:4]
#>                      disp  hp
#> Mazda RX4           160.0 110
#> Mazda RX4 Wag       160.0 110
#> Datsun 710          108.0  93
#> Hornet 4 Drive      258.0 110
#> Hornet Sportabout   360.0 175
#> Valiant             225.0 105
#> Duster 360          360.0 245
#> Merc 240D           146.7  62
#> Merc 230            140.8  95
#> Merc 280            167.6 123
#> Merc 280C           167.6 123
#> Merc 450SE          275.8 180
#> Merc 450SL          275.8 180
#> Merc 450SLC         275.8 180
#> Cadillac Fleetwood  472.0 205
#> Lincoln Continental 460.0 215
#> Chrysler Imperial   440.0 230
#> Fiat 128             78.7  66
#> Honda Civic          75.7  52
#> Toyota Corolla       71.1  65
#> Toyota Corona       120.1  97
#> Dodge Challenger    318.0 150
#> AMC Javelin         304.0 150
#> Camaro Z28          350.0 245
#> Pontiac Firebird    400.0 175
#> Fiat X1-9            79.0  66
#> Porsche 914-2       120.3  91
#> Lotus Europa         95.1 113
#> Ford Pantera L      351.0 264
#> Ferrari Dino        145.0 175
#> Maserati Bora       301.0 335
#> Volvo 142E          121.0 109
mtcars[1,1]
#> [1] 21
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.