Convert matrix of items to dataset

Hello,

I have a matrix of items from a psychological test that I would to convert to a dataset format (dataframe).

The format of the matrix is:

1: 2 2: 2 3: 2 4: 1 5: 2 6: 1 7: 2 8: 2 9: 1 10: 1, etc. The numbers before the ":" correspond to the item number, the number following corresponds to the response (1=yes; 2=no).

I need to create a dataset with two rows, the first being the item number, the second being the response. So it would be a really wide dataset since there are 567 items.

Anyone could give suggestions about the package that I could use to do this (or example of code if possible) ?

It is not clear what your original data structure is. If it is a vector like DAT in the code below, then this set of steps might work for you.

``` r
DAT <- c('1: 2', '2: 2', '3: 2', '4: 1', '5: 2', '6: 1', 
  '7: 2', '8: 2', '9: 1', '10: 1')
DAT
#>  [1] "1: 2"  "2: 2"  "3: 2"  "4: 1"  "5: 2"  "6: 1"  "7: 2"  "8: 2"  "9: 1" 
#> [10] "10: 1"
library(stringr)
item <- str_extract(DAT, "^\\d+")
item
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
response <- str_extract(DAT, "\\d+$")
response
#>  [1] "2" "2" "2" "1" "2" "1" "2" "2" "1" "1"
MAT <- rbind(item, response)
DF <- as.data.frame(MAT)
DF
#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> item      1  2  3  4  5  6  7  8  9  10
#> response  2  2  2  1  2  1  2  2  1   1

Created on 2022-12-11 with reprex v2.0.2

If your data are in a matrix, please post an example of a small subset. You can post the code to make the subset using the matrix() function or post the output of

dput(MAT)

where MAT is your small example.

Hello,

The matrix is in a text document as a .txt file.
It goes like this:

1: 2 2: 2 3: 2 4: 1 5: 2 6: 1 7: 2 8: 2 9: 1 10: 1
11: 2 12: 1 13: 1 14: 2 15: 1 16: 2 17: 2 18: 2 19: 2 20: 1
21: 2 22: 2 23: 2 24: 2 25: 2 26: 2 27: 2 28: 2 29: 1 30: 2
31: 1 32: 2 33: 1 34: 1 35: 2 36: 2 37: 1 38: 1 39: 1 40: 2
41: 2 42: 1 43: 2 44: 2 45: 1 46: 1 47: 1 48: 2 49: 1 50: 1
(...)

There are 567 items in total.

This should be a simple problem to solve but for some reason I am really stuck...

Thanks,

David

In the example you posted, everything is separated with spaces, so I wrote code to handle that case. I read in the whole data set and then took the odd columns as items and the even columns as responses. The items included the colons, so I deleted those. I then rotated the data into a long form with a name column showing the original column they came from (V1, V2,etc) and an item column storing the item value. I replaced the name with a sequence of V1 to V50 and then rotated the data into the wide format of 1 row and 50 columns. I repeated the process with the responses and then used rbind to make the final data of two rows and 50 columns.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
DF <- read.table("~/R/Play/TexMat.txt", sep = " ")
items <- DF[, seq(1, 19, by = 2)]
items
#>    V1  V3  V5  V7  V9 V11 V13 V15 V17 V19
#> 1  1:  2:  3:  4:  5:  6:  7:  8:  9: 10:
#> 2 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
#> 3 21: 22: 23: 24: 25: 26: 27: 28: 29: 30:
#> 4 31: 32: 33: 34: 35: 36: 37: 38: 39: 40:
#> 5 41: 42: 43: 44: 45: 46: 47: 48: 49: 50:
#delete the colons
items <- items |> mutate(across(.cols = everything(),.fns = ~sub(":", "", .x)))
items
#>   V1 V3 V5 V7 V9 V11 V13 V15 V17 V19
#> 1  1  2  3  4  5   6   7   8   9  10
#> 2 11 12 13 14 15  16  17  18  19  20
#> 3 21 22 23 24 25  26  27  28  29  30
#> 4 31 32 33 34 35  36  37  38  39  40
#> 5 41 42 43 44 45  46  47  48  49  50

items <- items |> pivot_longer(cols = everything(), values_to = "item") |> 
  mutate(name = paste0("V", 1:(nrow(DF)*ncol(DF)/2))) |> 
  pivot_wider(names_from = "name", values_from = "item")


response <- DF[, seq(2, 20, by = 2)]
response <- response |> pivot_longer(cols = everything(), 
                                     values_to = "response") |> 
  mutate(name = paste0("V", 1:(nrow(DF)*ncol(DF)/2))) |> 
  pivot_wider(names_from = "name", values_from = "response")

Final <- rbind(items, response)
nrow(Final)
#> [1] 2
ncol(Final)
#> [1] 50

Created on 2022-12-12 with reprex v2.0.2

Hi @FJCC,

can you share DF please ? This is difficult to follow without example data.

Here you go.

structure(list(V1 = c("1:", "11:", "21:", "31:", "41:"), V2 = c(2L, 
2L, 2L, 1L, 2L), V3 = c("2:", "12:", "22:", "32:", "42:"), V4 = c(2L, 
1L, 2L, 2L, 1L), V5 = c("3:", "13:", "23:", "33:", "43:"), V6 = c(2L, 
1L, 2L, 1L, 2L), V7 = c("4:", "14:", "24:", "34:", "44:"), V8 = c(1L, 
2L, 2L, 1L, 2L), V9 = c("5:", "15:", "25:", "35:", "45:"), V10 = c(2L, 
1L, 2L, 2L, 1L), V11 = c("6:", "16:", "26:", "36:", "46:"), V12 = c(1L, 
2L, 2L, 2L, 1L), V13 = c("7:", "17:", "27:", "37:", "47:"), V14 = c(2L, 
2L, 2L, 1L, 1L), V15 = c("8:", "18:", "28:", "38:", "48:"), V16 = c(2L, 
2L, 2L, 1L, 2L), V17 = c("9:", "19:", "29:", "39:", "49:"), V18 = c(1L, 
2L, 1L, 1L, 1L), V19 = c("10:", "20:", "30:", "40:", "50:"), 
    V20 = c(1L, 1L, 2L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))

Thank you very much, can you please share a code how did you create that DF ? This is very interesting by itself.
best regards.

Just moving up to kindly remind about my question how to create such a matrix ?
I would be grateful for reply, thank you.

Sorry, I had not seen your question. I copied the data from @DJoubert 's post above into a text file.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.