How to match my prediction and the original variable?

I admit my question is not clear. Let data speak. I have a true data set:

> vowel.test$y
  [1]  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2
 [47]  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4
 [93]  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6
[139]  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8
[185]  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10
[231] 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1
[277]  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3
[323]  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5
[369]  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7
[415]  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9 10 11  1  2  3  4  5  6  7  8  9
[461] 10 11

Notice that this is the true group variable. 1,2,3,4...,11 Then repeat 42 times. So we have 11*42 = 462 observation.

I also predict these 462 observation to guess their group.

> qdatestpredicted
  [1] 1  2  2  4  7  6  7  8  8  1  11 1  2  2  4  7  6  7  8  8  1  6  1  2  2  4  7  6  7  8  8  1  6  1  2  2  4  7  6  7  8  8  1  11 1  2 
 [47] 2  3  7  9  7  8  8  1  11 1  2  2  3  7  11 7  8  8  1  11 2  2  4  4  5  5  7  8  9  10 9  2  2  4  4  5  6  7  8  9  1  9  2  1  4  4 
 [93] 5  6  7  8  9  1  9  2  1  4  4  5  6  7  8  10 10 11 2  1  3  4  5  6  7  8  10 10 11 2  1  3  4  5  6  7  8  10 10 11 1  2  3  6  7  5 
[139] 7  8  9  9  11 1  2  3  6  7  5  7  8  8  10 9  1  2  3  6  7  5  7  8  9  10 9  1  2  3  6  7  5  7  7  7  10 9  1  2  3  6  5  6  7  8 
[185] 9  10 9  1  1  3  6  5  6  7  8  8  10 11 2  2  3  4  6  6  7  7  9  2  11 2  1  3  4  6  6  7  7  8  2  11 2  1  3  4  6  6  7  7  8  2 
[231] 11 2  1  3  4  5  6  7  8  8  10 11 1  1  3  4  5  6  7  8  8  1  11 1  1  3  4  5  11 5  8  9  1  11 1  2  2  4  6  6  5  7  8  9  6  1 
[277] 3  2  6  6  6  7  7  8  9  6  1  2  2  6  6  6  7  7  7  9  6  1  2  7  6  6  5  7  7  7  9  7  1  3  7  6  6  6  7  7  9  9  7  1  2  3 
[323] 4  6  6  6  7  9  9  7  1  2  7  6  6  6  6  10 11 1  11 1  2  3  6  6  6  6  7  11 2  11 1  2  3  6  6  6  6  7  11 3  4  1  3  3  6  6 
[369] 6  6  7  11 3  11 1  2  7  6  6  6  6  7  4  3  9  1  2  3  6  6  4  6  7  6  1  9  2  2  2  4  6  6  7  7  9  1  11 2  2  2  4  6  6  7 
[415] 7  9  10 11 1  2  3  4  6  6  7  7  9  10 11 1  2  3  4  6  6  7  7  9  10 11 1  2  3  4  6  6  7  7  9  10 11 1  2  2  4  6  6  7  7  9 
[461] 10 11
Levels: 1 2 3 4 5 6 7 8 9 10 11

Now I need to judge which groups are most difficult to differentiate. Hence, I want to know the error rate within each type. How to do?

My code below can let me know the overall correct rate / error rate. But how to know the error rate within each type?

> sum(as.vector(as.numeric(qdatestpredicted)) == vowel.test$y)
[1] 256

I would use the summarize functionality of dplyr . With testdata that looks like

set.seed(2021)
vty <- rep(1:11, times=42)
qtp <- sample(1:11,462,replace=T)
df1 <- data.frame(vty=vty,qtp=qtp)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df2 <- df1 %>%
  mutate(eq=ifelse(vty==qtp,1,0)) %>%
  group_by(vty) %>%
  summarize(eq=sum(eq)) 
print(df2)
#> # A tibble: 11 x 2
#>      vty    eq
#>    <int> <dbl>
#>  1     1     1
#>  2     2     2
#>  3     3     3
#>  4     4     5
#>  5     5     5
#>  6     6     6
#>  7     7     4
#>  8     8     5
#>  9     9     1
#> 10    10     4
#> 11    11     5
Created on 2021-12-01 by the reprex package (v2.0.0)
1 Like

Can you tell me what these codes return? Does it mean there is 1 match for group 1, 2 match for group 2, 3 matches for group 3, 5 matches for group 4 etc?

Yes that is the case. If you leave out the last part you check :

df2 <- df1 %>%
  mutate(eq=ifelse(vty==qtp,1,0))
print(df2)

but then it is easier (to check) if you change 42 in 4 and 462 in 44

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.