Continuing the discussion from Repeat function on dataframe with multiple factors--sapply?:
Hey,
did not get notifications about your edited post. Hence no answer .
Your errors are again straight forward. The first one indicates that the package magrittr
is not loaded. Run library(tidyverse)
at the beginnng and this one should be fixed. However, using sapply()
In a dplyr chain is a bit odd. You might want to have a look at dplyr::summarise()
.
The second error indicates a missing object. It seems like you did not define this object, hence R cannot find it and throws an error.
In addition: Please copy the data and question from the linked post. It will help other forum members to help you, if they don't have to follow links to the actual question
Kind regards
@FactOREO I will copy the data from the linked post. Again, I am trying to run this Grubbs test on the data from each of 6 factors.
thanks
Data--
There are actually 288 observations of these 6 factors:
head(high)
#> Error in head(high): object 'high' not found
tibble::tribble(
~date.well.combined2.parm.code.....adj,
"1 2008-10-09 MW01 nox 0.0075",
"2 2008-10-09 MW07 nox 1.7000",
"3 2008-10-10 MW11 nox 4.6000",
"4 2008-10-10 MW22 nox 0.1900",
"5 2008-10-10 SW01 nox 1.4000",
"6 2008-10-21 MW04 nox 12.0000"
)
#> # A tibble: 6 × 1
#> date.well.combined2.parm.code.....adj
#> <chr>
#> 1 1 2008-10-09 MW01 nox 0.0075
#> 2 2 2008-10-09 MW07 nox 1.7000
#> 3 3 2008-10-10 MW11 nox 4.6000
#> 4 4 2008-10-10 MW22 nox 0.1900
#> 5 5 2008-10-10 SW01 nox 1.4000
#> 6 6 2008-10-21 MW04 nox 12.0000
Created on 2022-11-07 by the reprex package (v2.0.1)
The code--I tried it two ways:
outliers.grubb <- high %>%
dpplyr::group_by(well.combined2) %>%
sapply(adj, grubbs.test, na.rm = T)
#> Error in high %>% dpplyr::group_by(well.combined2) %>% sapply(adj, grubbs.test, : could not find function "%>%"
outliers.grubb <- tapply(high$well.combined2, high$adj, grubbs.test, na.rm =T)
#> Error in tapply(high$well.combined2, high$adj, grubbs.test, na.rm = T): object 'grubbs.test' not found
Created on 2022-11-07 by the reprex package (v2.0.1)
Hey,
your data is rather odd. The tibble you provided consists of only one variable, instead of multiple. Hence there cannot be any calculations done.
This is due to the missing magrittr
package. Type library(tidyverse)
before the code and this issue should be fixed.
This indicates that there is no object grubbs.test
defined in your workspace.
Maybe you can
a) provide valid data
b) present more of the code, especially the part that defined the object grubbs.test
To provide the data with the reprex package, you have to create it inside the reprex::reprex()
call as well as loading all necessary packages. The reprex
package always runs your selected code from a fresh session with no additional packages loaded and an empty workspace. Another option to provide the data and your true error messages is to use the dput()
function and pasting the errors.
You did not do that in the first place, hence there was this part of your code
If you provide the relevant parts of the data (it doesn't have to be the real data, just the same structure) as well as the relevant parts of your code, I will try my best to help you.
Kind regards
@FactOREO -I apologize for the poor data representation. Hopefully this will be better.
I think my question may be more simple--I just want to run this package (grubbs.test) on each of my variables in "well.combined2" (there are 6 wells, and I want to test for outliers for each of the wells).
I have the "outliers" library loaded, and I was able to run grubbs.test on one column of data.
This is the code for grubbs test:
grubbs.test(x, type = 10, opposite = FALSE, two.sided = FALSE)
Do I use a loop?
thanks
Data: (with 20 rows of data)
data.frame(
date = c("2008-10-09","2008-10-09",
"2008-10-10","2008-10-10","2008-10-10","2008-10-21",
"2009-03-22","2009-03-22","2009-03-23","2009-03-23","2009-03-23",
"2009-03-24","2009-06-02","2009-06-03","2009-06-03",
"2009-06-03","2009-06-03","2009-06-03","2009-07-28",
"2009-07-29"),
adj = c(0.0075,1.7,4.6,0.19,1.4,12,
0.005,4.7,14,4.6,0.97,6.4,0.005,2.7,5.7,3.3,3.4,
1.3,0.005,12),
well.combined2 = as.factor(c("MW01","MW07",
"MW11","MW22","SW01","MW04","MW01","MW07",
"MW04","MW11","SW01","MW22","MW01","MW04",
"MW07","MW11","MW22","SW01","MW01","MW04"))
)
#> date adj well.combined2
#> 1 2008-10-09 0.0075 MW01
#> 2 2008-10-09 1.7000 MW07
#> 3 2008-10-10 4.6000 MW11
#> 4 2008-10-10 0.1900 MW22
#> 5 2008-10-10 1.4000 SW01
#> 6 2008-10-21 12.0000 MW04
#> 7 2009-03-22 0.0050 MW01
#> 8 2009-03-22 4.7000 MW07
#> 9 2009-03-23 14.0000 MW04
#> 10 2009-03-23 4.6000 MW11
#> 11 2009-03-23 0.9700 SW01
#> 12 2009-03-24 6.4000 MW22
#> 13 2009-06-02 0.0050 MW01
#> 14 2009-06-03 2.7000 MW04
#> 15 2009-06-03 5.7000 MW07
#> 16 2009-06-03 3.3000 MW11
#> 17 2009-06-03 3.4000 MW22
#> 18 2009-06-03 1.3000 SW01
#> 19 2009-07-28 0.0050 MW01
#> 20 2009-07-29 12.0000 MW04
Created on 2022-11-08 by the reprex package (v2.0.1)
Code:
outliers.grubb <- tapply(high$well.combined2, high$adj, grubbs.test, na.rm =T)
#> Error in tapply(high$well.combined2, high$adj, grubbs.test, na.rm = T): object 'grubbs.test' not found
Created on 2022-11-08 by the reprex package (v2.0.1)
Alright, now we are cooking with gas.
First, I had a look on the documentation of the outliers::grubbs.test()
function. There is no na.rm
argument to this function, so this will give an error under any circumstances, since unused arguments cause an error.
Second, in your tapply()
call, you switched the order of the X
and the INDEX
arguments, hence you cannot calculate any metric operations on the factor high$well.combined2
. So switch the order, and tapply()
will not throw an error regarding the wrong positioning.
Last but not least, the error
does indicate that R
does not know what grubbs.test()
is and guesses, it is an undefined object. Since it is indeed a function, you either just forgot to load library(outliers)
(in your regular session or just in the reprex (?)) or there is a typo. But looking at the following code indicates, that this is indeed a loading issue:
library(outliers)
outliers.grubb <- tapply(high$adj, high$well.combined2, grubbs.test)
#> Warning in sqrt(s): NaNs wurden erzeugt
outliers.grubb
#> $MW01
#>
#> Grubbs test for one outlier
#>
#> data: X[[i]]
#> G = 1.5, U = 0.0, p-value < 2.2e-16
#> alternative hypothesis: highest value 0.0075 is an outlier
#>
#>
#> $MW04
#>
#> Grubbs test for one outlier
#>
#> data: X[[i]]
#> G = 1.473854, U = 0.034557, p-value = 0.03486
#> alternative hypothesis: lowest value 2.7 is an outlier
#>
#>
#> $MW07
#>
#> Grubbs test for one outlier
#>
#> data: X[[i]]
#> G = 1.120897, U = 0.057692, p-value = 0.2316
#> alternative hypothesis: lowest value 1.7 is an outlier
#>
#>
#> $MW11
#>
#> Grubbs test for one outlier
#>
#> data: X[[i]]
#> G = 1.1547, U = 0.0000, p-value = 2.846e-08
#> alternative hypothesis: lowest value 3.3 is an outlier
#>
#>
#> $MW22
#>
#> Grubbs test for one outlier
#>
#> data: X[[i]]
#> G = 1.01108, U = 0.23329, p-value = 0.4814
#> alternative hypothesis: lowest value 0.19 is an outlier
#>
#>
#> $SW01
#>
#> Grubbs test for one outlier
#>
#> data: X[[i]]
#> G = 1.125833, U = 0.049375, p-value = 0.214
#> alternative hypothesis: lowest value 0.97 is an outlier
Created on 2022-11-08 by the reprex package (v2.0.1)
So your main problem were the misplaced arguments in tapply()
in addition to the not used argument na.rm
. Fix those points and you should be good to go.
Kind regards
@FactOREO --thanks again! Switching that order in the tapply did the trick!
One last question regarding this: The output is an array. I tried to save this as a csv, but could not figure out how to first convert it to a data frame. The "as.dataframe" did not work, nor did the "write.table".
Is there a way to convert this to a data frame?
Thanks!
Thankfully there is the broom package and it got your function covered:
lapply(outliers.grubb, broom::tidy) |>
collapse::unlist2d()
#> .id statistic p.value method
#> 1 MW01 1.50000000 0.000000e+00 Grubbs test for one outlier
#> 2 MW01 0.00000000 0.000000e+00 Grubbs test for one outlier
#> 3 MW04 1.47385449 3.486068e-02 Grubbs test for one outlier
#> 4 MW04 0.03455686 3.486068e-02 Grubbs test for one outlier
#> 5 MW07 1.12089708 2.316314e-01 Grubbs test for one outlier
#> 6 MW07 0.05769231 2.316314e-01 Grubbs test for one outlier
#> 7 MW11 1.15470054 2.845912e-08 Grubbs test for one outlier
#> 8 MW11 0.00000000 2.845912e-08 Grubbs test for one outlier
#> 9 MW22 1.01107946 4.813584e-01 Grubbs test for one outlier
#> 10 MW22 0.23328875 4.813584e-01 Grubbs test for one outlier
#> 11 SW01 1.12583327 2.139752e-01 Grubbs test for one outlier
#> 12 SW01 0.04937459 2.139752e-01 Grubbs test for one outlier
#> alternative
#> 1 highest value 0.0075 is an outlier
#> 2 highest value 0.0075 is an outlier
#> 3 lowest value 2.7 is an outlier
#> 4 lowest value 2.7 is an outlier
#> 5 lowest value 1.7 is an outlier
#> 6 lowest value 1.7 is an outlier
#> 7 lowest value 3.3 is an outlier
#> 8 lowest value 3.3 is an outlier
#> 9 lowest value 0.19 is an outlier
#> 10 lowest value 0.19 is an outlier
#> 11 lowest value 0.97 is an outlier
#> 12 lowest value 0.97 is an outlier
Created on 2022-11-08 by the reprex package (v2.0.1)
The result is a data.frame with all relevant informations about the test statistics calculated.
@FactOREO --Thank you this works! Super helpful!
In your code: "|>" . What is this doing?
Thanks again!
Just as a sidenote: Checkmarking the answer as solution will remove the previous one (which was related to your original question). To avoid confusion for other users, consider re-accepting my previous post regarding your original request.
The |>
is the R native pipe. It works similar to the magrittr
pipe (%>%
) and chains functions together. Using a |> f(b)
is the same as f(a,b)
and a |> f() |> g()
is the same as doing g(f(a))
. There are some shortages in functionality, but for the vast majority of tasks the native pipe is sufficient.
Kind regards