Convert matrix of items to dataset

DJoubert · December 12, 2022, 3:46am

Hello,

I have a matrix of items from a psychological test that I would to convert to a dataset format (dataframe).

The format of the matrix is:

1: 2 2: 2 3: 2 4: 1 5: 2 6: 1 7: 2 8: 2 9: 1 10: 1, etc. The numbers before the ":" correspond to the item number, the number following corresponds to the response (1=yes; 2=no).

I need to create a dataset with two rows, the first being the item number, the second being the response. So it would be a really wide dataset since there are 567 items.

Anyone could give suggestions about the package that I could use to do this (or example of code if possible) ?

FJCC · December 12, 2022, 5:04am

It is not clear what your original data structure is. If it is a vector like DAT in the code below, then this set of steps might work for you.

``` r
DAT <- c('1: 2', '2: 2', '3: 2', '4: 1', '5: 2', '6: 1', 
  '7: 2', '8: 2', '9: 1', '10: 1')
DAT
#>  [1] "1: 2"  "2: 2"  "3: 2"  "4: 1"  "5: 2"  "6: 1"  "7: 2"  "8: 2"  "9: 1" 
#> [10] "10: 1"
library(stringr)
item <- str_extract(DAT, "^\\d+")
item
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
response <- str_extract(DAT, "\\d+$")
response
#>  [1] "2" "2" "2" "1" "2" "1" "2" "2" "1" "1"
MAT <- rbind(item, response)
DF <- as.data.frame(MAT)
DF
#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> item      1  2  3  4  5  6  7  8  9  10
#> response  2  2  2  1  2  1  2  2  1   1

^{Created on 2022-12-11 with reprex v2.0.2}

If your data are in a matrix, please post an example of a small subset. You can post the code to make the subset using the matrix() function or post the output of

dput(MAT)

where MAT is your small example.

DJoubert · December 12, 2022, 4:22pm

Hello,

The matrix is in a text document as a .txt file.
It goes like this:

1: 2 2: 2 3: 2 4: 1 5: 2 6: 1 7: 2 8: 2 9: 1 10: 1
11: 2 12: 1 13: 1 14: 2 15: 1 16: 2 17: 2 18: 2 19: 2 20: 1
21: 2 22: 2 23: 2 24: 2 25: 2 26: 2 27: 2 28: 2 29: 1 30: 2
31: 1 32: 2 33: 1 34: 1 35: 2 36: 2 37: 1 38: 1 39: 1 40: 2
41: 2 42: 1 43: 2 44: 2 45: 1 46: 1 47: 1 48: 2 49: 1 50: 1
(...)

There are 567 items in total.

This should be a simple problem to solve but for some reason I am really stuck...

Thanks,

David

FJCC · December 12, 2022, 6:32pm

In the example you posted, everything is separated with spaces, so I wrote code to handle that case. I read in the whole data set and then took the odd columns as items and the even columns as responses. The items included the colons, so I deleted those. I then rotated the data into a long form with a name column showing the original column they came from (V1, V2,etc) and an item column storing the item value. I replaced the name with a sequence of V1 to V50 and then rotated the data into the wide format of 1 row and 50 columns. I repeated the process with the responses and then used rbind to make the final data of two rows and 50 columns.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
DF <- read.table("~/R/Play/TexMat.txt", sep = " ")
items <- DF[, seq(1, 19, by = 2)]
items
#>    V1  V3  V5  V7  V9 V11 V13 V15 V17 V19
#> 1  1:  2:  3:  4:  5:  6:  7:  8:  9: 10:
#> 2 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
#> 3 21: 22: 23: 24: 25: 26: 27: 28: 29: 30:
#> 4 31: 32: 33: 34: 35: 36: 37: 38: 39: 40:
#> 5 41: 42: 43: 44: 45: 46: 47: 48: 49: 50:
#delete the colons
items <- items |> mutate(across(.cols = everything(),.fns = ~sub(":", "", .x)))
items
#>   V1 V3 V5 V7 V9 V11 V13 V15 V17 V19
#> 1  1  2  3  4  5   6   7   8   9  10
#> 2 11 12 13 14 15  16  17  18  19  20
#> 3 21 22 23 24 25  26  27  28  29  30
#> 4 31 32 33 34 35  36  37  38  39  40
#> 5 41 42 43 44 45  46  47  48  49  50

items <- items |> pivot_longer(cols = everything(), values_to = "item") |> 
  mutate(name = paste0("V", 1:(nrow(DF)*ncol(DF)/2))) |> 
  pivot_wider(names_from = "name", values_from = "item")


response <- DF[, seq(2, 20, by = 2)]
response <- response |> pivot_longer(cols = everything(), 
                                     values_to = "response") |> 
  mutate(name = paste0("V", 1:(nrow(DF)*ncol(DF)/2))) |> 
  pivot_wider(names_from = "name", values_from = "response")

Final <- rbind(items, response)
nrow(Final)
#> [1] 2
ncol(Final)
#> [1] 50

^{Created on 2022-12-12 with reprex v2.0.2}

Andrzej · December 12, 2022, 7:06pm

Hi @FJCC,

can you share DF please ? This is difficult to follow without example data.

FJCC · December 12, 2022, 7:12pm

Here you go.

structure(list(V1 = c("1:", "11:", "21:", "31:", "41:"), V2 = c(2L, 
2L, 2L, 1L, 2L), V3 = c("2:", "12:", "22:", "32:", "42:"), V4 = c(2L, 
1L, 2L, 2L, 1L), V5 = c("3:", "13:", "23:", "33:", "43:"), V6 = c(2L, 
1L, 2L, 1L, 2L), V7 = c("4:", "14:", "24:", "34:", "44:"), V8 = c(1L, 
2L, 2L, 1L, 2L), V9 = c("5:", "15:", "25:", "35:", "45:"), V10 = c(2L, 
1L, 2L, 2L, 1L), V11 = c("6:", "16:", "26:", "36:", "46:"), V12 = c(1L, 
2L, 2L, 2L, 1L), V13 = c("7:", "17:", "27:", "37:", "47:"), V14 = c(2L, 
2L, 2L, 1L, 1L), V15 = c("8:", "18:", "28:", "38:", "48:"), V16 = c(2L, 
2L, 2L, 1L, 2L), V17 = c("9:", "19:", "29:", "39:", "49:"), V18 = c(1L, 
2L, 1L, 1L, 1L), V19 = c("10:", "20:", "30:", "40:", "50:"), 
    V20 = c(1L, 1L, 2L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))

Andrzej · December 12, 2022, 7:21pm

Thank you very much, can you please share a code how did you create that DF ? This is very interesting by itself.
best regards.

Andrzej · December 14, 2022, 6:47am

Just moving up to kindly remind about my question how to create such a matrix ?
I would be grateful for reply, thank you.

FJCC · December 14, 2022, 7:55am

Sorry, I had not seen your question. I copied the data from @DJoubert 's post above into a text file.

system · January 25, 2023, 7:55am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.