Trying to convert column to numeric

ctheisen · September 6, 2022, 5:52pm

I'm trying to take the following dataframe and take the averages of some of the columns (like reference_1 and reference_2) but I can't seem to get the columns to register as numeric.

Data:

> dput(head(all.data))
structure(list(`Identified Proteins (3404)` = c("Casein kinase II subunit beta", 
"No_match_UP_pBlast", "DNA helicase", "Methanethiol oxidase_protein transport and brown fat cell differentation_pBlast", 
"No_match_UP_pBlast", "Thylakoid lumenal protein, chlorplastic_pBlast"
), `Accession Number` = c("A0A383V1G7", "A0A383V1H7", "A0A383V1J2", 
"A0A383V1M7", "A0A383V1R6", "A0A383V1S1"), reference_1 = c("8.0699999999999994E-2", 
"1.52E-2", "-0.1225", "4.4299999999999999E-2", "-2.41E-2", "-5.5999999999999999E-3"
), reference_2 = c("-0.12620000000000001", "-3.3999999999999998E-3", 
"0.113", "-5.1999999999999998E-2", "4.9799999999999997E-2", "1.5100000000000001E-2"
), An_24h_1 = c("0.4803", "-0.247", "0.2999", "0.2437", "-0.97709999999999997", 
"-1.0880000000000001"), An_24h_2 = c("0.2009", "-0.40379999999999999", 
"0.30609999999999998", "0.24909999999999999", "-1.0182", "-1.0642"
), An_28d_1 = c("-0.39629999999999999", "-1.0501", "0.94269999999999998", 
"-0.17030000000000001", "0.46289999999999998", "-1.3814"), An_28d_2 = c("9.9299999999999999E-2", 
"-0.53979999999999995", "0.83540000000000003", "-0.1983", "0.35149999999999998", 
"-1.4052"), C_24h_1 = c("0.17380000000000001", "0.22459999999999999", 
"0.13550000000000001", "-0.15820000000000001", "-0.96909999999999996", 
"0.29039999999999999"), C_24h_2 = c("0.32090000000000002", "0.20250000000000001", 
"-0.1242", "-3.0700000000000002E-2", "-0.74780000000000002", 
"9.8599999999999993E-2"), C_28d_1 = c("0.61170000000000002", 
"-0.35260000000000002", "0.96399999999999997", "9.4200000000000006E-2", 
"-0.40229999999999999", "-0.99809999999999999"), C_28d_2 = c("0.49249999999999999", 
"-0.30109999999999998", "0.55520000000000003", "3.44E-2", "-0.3841", 
"-0.79159999999999997")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

My current code:

all.data<-readxl::read_xlsx("Output of all algae proteins for blast sequences.xlsx") %>% 
  select("Identified Proteins (3404)","Accession Number","Reference...3",
         "Reference...4","Anaerobic_24h...5","Anaerobic_24h...6",
         "Anaerobic_28d...7","Anaerobic_28d...8","Citric_24h...9",
         "Citric_24h...10","Citric_28d...11","Citric_28d...12")

colnames(all.data)<-gsub("Reference...3","reference_1",colnames(all.data))
colnames(all.data)<-gsub("Reference...4","reference_2",colnames(all.data))
colnames(all.data)<-gsub("Anaerobic_24h...5","An_24h_1",colnames(all.data))
colnames(all.data)<-gsub("Anaerobic_24h...6","An_24h_2",colnames(all.data))
colnames(all.data)<-gsub("Anaerobic_28d...7","An_28d_1",colnames(all.data))
colnames(all.data)<-gsub("Anaerobic_28d...8","An_28d_2",colnames(all.data))
colnames(all.data)<-gsub("Citric_24h...9","C_24h_1",colnames(all.data))
colnames(all.data)<-gsub("Citric_24h...10","C_24h_2",colnames(all.data))
colnames(all.data)<-gsub("Citric_28d...11","C_28d_1",colnames(all.data))
colnames(all.data)<-gsub("Citric_28d...12","C_28d_2",colnames(all.data))
colnames(all.data)<-gsub("Identified Proteins (3404)","Protein",colnames(all.data))

I've tried using the as.numeric() command, but get the following:

> all.data %>% 
+   as.numeric(all.data$reference_1)
Error in all.data %>% as.numeric(all.data$reference_1) : 
  'list' object cannot be coerced to type 'double'

And I've tried doing the following, which worked in my prior work but now it just puts NA into the new column:

> all.data<-all.data %>% 
+   rowwise() %>% 
+   mutate(reference_avg=mean(c_across(c(`reference_1`,`reference_2`))))
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: Problem while computing `reference_avg = mean(c_across(c(reference_1, reference_2)))`.
ℹ argument is not numeric or logical: returning NA
ℹ The warning occurred in row 1.
2: Problem while computing `reference_avg = mean(c_across(c(reference_1, reference_2)))`.
ℹ argument is not numeric or logical: returning NA
ℹ The warning occurred in row 2.
3: Problem while computing `reference_avg = mean(c_across(c(reference_1, reference_2)))`.
ℹ argument is not numeric or logical: returning NA
ℹ The warning occurred in row 3.

I'm still very much a beginner in any sort of coding, so any advice is very helpful.

rene_at_coco · September 6, 2022, 6:11pm

It should be:

all.data <- all.data %>%
    mutate( # create a new variable
    reference_1 = as.numeric(reference_1) # named reference_1 where reference_1 is numeric
  )

ctheisen · September 6, 2022, 6:20pm

That works, thanks. Do you know of a shortcut to due this to all the columns except the first without having to do it as multiple lines of code, just to make the code a little neater?

ctheisen · September 6, 2022, 6:32pm

And actually, it did convert the column to numeric, but I still can't get it to find the average. Instead I get the following:

> all.data<-all.data %>% 
+   mutate(ref_avg=mean(reference_1,reference_2,na.rm=TRUE))
Error in `mutate()`:
! Problem while computing `ref_avg = mean(reference_1, reference_2, na.rm = TRUE)`.
Caused by error in `mean.default()`:
! 'trim' must be numeric of length one
Run `rlang::last_error()` to see where the error occurred.

rene_at_coco · September 6, 2022, 6:47pm

# mean for each variable
all.data %>%
  mutate(
    across(`Accession Number`:C_28d_2, as.numeric) # mutate selected variables to numeric
  ) %>%
  summarise( # mean of each variable
    ref_1_avg = mean(reference_1), # mean of ref 1
    ref_2_avg = mean(reference_2) # mean of ref 2
  )
  
# mean for reference variables 
all.data %>%
  mutate(
    across(`Accession Number`:C_28d_2, as.numeric) # mutate selected variables to numeric
  ) %>%
  select( # select columns to pivot 
    `Identified Proteins (3404)`, 
    reference_1,
    reference_2
  ) %>%
  pivot_longer(
    !`Identified Proteins (3404)` # make the data table, long
  ) %>%
  summarise( # mean of refence variables
    ref_avg = mean(value)
  )

ctheisen · September 6, 2022, 6:57pm

Sorry, there may be some confusion, so each row is a separate observation. I want a column that has the average of reference_1 and reference_2 across each row. So I can't just do an average of the whole column. So like I want a reference_avg for each row, then a An_24h_avg for each row, so on and so forth, does that make sense?

rene_at_coco · September 6, 2022, 8:28pm

all.data %>%
  mutate(
    across(`Accession Number`:C_28d_2, as.numeric) # mutate selected variables to numeric
  ) %>%
  mutate(
    ref_avg = (reference_1 + reference_2) /2)

rene_at_coco · September 9, 2022, 6:37pm

Or, like the solution found in this thread (rowwise):

R-studio doesn't appear to be recognizing a numerical variable tidyverse

Here you are literally asking R to average two text strings, "Reference A" and "Reference B". To reference non-syntactic variable names you have to enclose them between backticks, not double-quotes. Also, the mean() function works "by column" not "by row". If you want to calculate the mean of values that come from more than one column, you need to use rowwise() and c_across() to reference the columns. The code should look something like this library(dplyr) full_data <- full_data %>% rowwise() %>% mutate(reference_avg = mean(c_across(c(`Reference A`,`Reference B`)))) If this doesn't solve your problem, please provide a proper REPRoducible EXample (reprex) illustrating your issue…

system · October 21, 2022, 6:38pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.