How to calculate mean, standard deviation and number of samples from a data frame

The dput is character data. For the original layout, never use a double header—save that for reports as noted by @gueyenono

Here's a reprex FAQ minimal reproducible example which illustrates the need to rethink the data layout along tidy lines identified by @phiggins

mk_arcsq <- function(x) asin(sqrt(x[, 2:length(x)] / 100))

# OP's dput was character

DF <- structure(list(clade_name = c(
  "Actinomyces_odontolyticus", "Bifidobacterium_adolescentis",
  "Bifidobacterium_bifidum", "Bifidobacterium_longum", "Bifidobacterium_pseudocatenulatum",
  "Collinsella_aerofaciens", "Collinsella_intestinalis", "Collinsella_stercoris"
), ERR275252_profile = c(
  0, 0.26989, 0, 2.46071, 0, 3.91749,
  0.01625, 0.06886
), ERR260268_profile = c(
  0, 0.21046, 0.07668,
  1.68152, 0, 1.27748, 0, 0.01677
), ERR260265_profile = c(
  0.00341,
  0, 0, 0, 0, 0.68346, 0.00148, 0.01178
), ERR260264_profile = c(
  0,
  2.00023, 0.33464, 1.76625, 0, 3.54635, 0.01948, 0.07679
), ERR260263_profile = c(
  0.03155,
  1.24158, 0, 1.91239, 0, 3.45814, 0.00618, 0.08548
), ERR260261_profile = c(
  0,
  0.9991, 0, 0, 0, 1.35682, 0.00277, 0.0159
)), class = c(
  "spec_tbl_df",
  "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), spec = structure(list(
  cols = list(clade_name = structure(list(), class = c(
    "collector_character",
    "collector"
  )), ERR275252_profile = structure(list(), class = c(
    "collector_double",
    "collector"
  )), ERR260268_profile = structure(list(), class = c(
    "collector_double",
    "collector"
  )), ERR260265_profile = structure(list(), class = c(
    "collector_double",
    "collector"
  )), ERR260264_profile = structure(list(), class = c(
    "collector_double",
    "collector"
  )), ERR260263_profile = structure(list(), class = c(
    "collector_double",
    "collector"
  )), ERR260261_profile = structure(list(), class = c(
    "collector_double",
    "collector"
  ))), default = structure(list(), class = c(
    "collector_guess",
    "collector"
  )), skip = 1L
), class = "col_spec"))

# sq
DF[, 2:length(DF)] <- mk_arcsq(DF)

o <- DF[1,]

pander::pander(o)
Table continues below
clade_name ERR275252_profile ERR260268_profile
Actinomyces_odontolyticus 0 0

Table continues below

ERR260265_profile ERR260264_profile ERR260263_profile ERR260261_profile
0.00584 0 0.01776 0

Created on 2021-01-01 by the reprex package (v0.3.0.9001)

The last line illustrates the need to pivot_longer as suggested by @phiggins — if the bacterium is the observation, the corresponding vector of ERR*profile values will not provide informative statistics.

1 Like