Change to dbl after import csv created factor

tidyverse
#1

[shyly mumbling] Hi, I'm Jason, an #rstatsnewbie - and I have a problem.

I’m trying to use R entirely, for the first time, to do some pilot analyses. I’ve used read.csv to load the raw data into RS. However, I noticed (so far) that at least one variable/vector was set as type factor (i.e., ). I’ve searched the communities posts, am going through R4DS, and the R for SAS and SPSS users (2nd Ed), and haven't figured out how to convert the vector/variable to a double (i.e., ). I realize this is a basic question and likely very easily accomplished, but I'm stumped. Any help would be greatly appreciated.

Cheers.

0 Likes

#2

First, if using read.csv(), I would set stringsAsFactors = FALSE, which will result in the data coming in as Character instead of Factor. To move beyond that, we have to know what it is about the data that prevent it being interpreted as a number. Can you post a small sample of the data, preferably as a reproducible example (reprex).

As a guess, you may be able to use the sub() or gsub() functions to replace undesirable characters with empty strings and then use as.numeric().

0 Likes

#3

Sorry this took so long. I was trying to use reprex with .Rmd
As you can see below, a few of the asq_*a variables were coded as factors instead of doubles like the majority of the asq variables.

# ---
# title: "reprex vector type"
# author: "Jason"
# date: "4/12/2019"
# output: html_document
# ---

library(tidyverse)
library(reprex)
# library(here)

## IMPORT DATA FOR THE PILOT ANALYSES

data <- read.csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pilot_2019_04_09.csv'), header = TRUE)
data <- as_tibble(data)
arrange(data, Count)
#> # A tibble: 120 x 166
#>    participant_id redcap_event_na… Count   sex demo_dob race___1 race___2
#>    <fct>          <fct>            <int> <int> <fct>       <int>    <int>
#>  1 IP005          baseline_data_a…     1     1 1986-07…        1        0
#>  2 IP006          baseline_data_a…     2     1 1989-02…        1        0
#>  3 IP008          baseline_data_a…     3     2 1993-02…        1        0
#>  4 IP009          baseline_data_a…     4     2 1994-05…        1        0
#>  5 IP013          baseline_data_a…     5     1 1992-12…        0        1
#>  6 IP015          baseline_data_a…     6     1 1994-02…        1        0
#>  7 IP016          baseline_data_a…     7     1 1993-04…        1        0
#>  8 IP017          baseline_data_a…     8     1 1991-07…        1        0
#>  9 IP019          baseline_data_a…     9     2 1993-05…        1        0
#> 10 IP020          baseline_data_a…    10     2 1994-01…        1        0
#> # … with 110 more rows, and 159 more variables: race___3 <int>,
#> #   race___4 <int>, race___5 <int>, race___6 <int>, ethnic_category <int>,
#> #   employed <int>, cohabitation <fct>, marital_status <int>,
#> #   education <int>, prior_pain_about <lgl>, chronic_pain <int>,
#> #   Groups2 <fct>, condition <int>, paintype <int>, painduration <fct>,
#> #   paintype2 <int>, paintype4 <int>, GroupType <int>, Groups3 <fct>,
#> #   mcgill1 <int>, mcgill2 <int>, mcgill3 <int>, mcgill4 <int>,
#> #   mcgill5 <int>, mcgill6 <int>, mcgill7 <int>, mcgill8 <int>,
#> #   mcgill9 <int>, mcgill10 <int>, mcgill11 <int>, mcgill12 <int>,
#> #   mcgill13 <int>, mcgill14 <int>, mcgill15 <int>, mcgill16 <int>,
#> #   mcgill17 <int>, mcgill18 <int>, mcgill19 <int>, mcgill20 <int>,
#> #   bdi1 <int>, bdi2 <int>, bdi3 <int>, bdi4 <int>, bdi5 <int>,
#> #   bdi6 <int>, bdi7 <int>, bdi8 <int>, bdi9 <int>, bdi10 <int>,
#> #   bdi11 <int>, bdi12 <int>, bdi13 <int>, bdi14 <int>, bdi15 <int>,
#> #   bdi16 <int>, bdi17 <int>, bdi18 <int>, bdi19 <int>, bdi20 <int>,
#> #   bdi21 <int>, pdi1 <int>, pdi2 <int>, pdi3 <int>, pdi4 <int>,
#> #   pdi5 <int>, pdi6 <int>, pdi7 <int>, asq_1 <dbl>, asq_1a <dbl>,
#> #   asq_2 <dbl>, asq_2a <dbl>, asq_3 <dbl>, asq_3a <dbl>, asq_4 <dbl>,
#> #   asq_4a <dbl>, asq_5 <dbl>, asq_5a <dbl>, asq_6 <dbl>, asq_6a <dbl>,
#> #   asq_7 <dbl>, asq_7a <dbl>, asq_8 <dbl>, asq_8a <dbl>, asq_9 <dbl>,
#> #   asq_9a <fct>, asq_10 <dbl>, asq_10a <dbl>, asq_11 <dbl>,
#> #   asq_11a <fct>, asq_12 <dbl>, asq_12a <dbl>, asq_13 <dbl>,
#> #   asq_13a <dbl>, asq_14 <dbl>, asq_14a <fct>, asq_15 <dbl>,
#> #   asq_15a <fct>, X <lgl>, typical_alc_use1 <dbl>,
#> #   typical_alc_use2 <dbl>, …
tmp <- select(data, participant_id, contains("asq_15"))
#View(tmp)
#   TRY TO CONVERT FACTOR TO DOUBLE
tmp2 <- mutate_if(tmp, is.factor, as.numeric(as.character(tmp,"asq_15a")))
#> Warning in is_fun_list(.x): NAs introduced by coercion
#> Can't create call to non-callable object

Created on 2019-04-12 by the reprex package (v0.2.1)

1 Like

#4

Since you included a link to your data, I read it in and found that the asq columns that are factors include values #DIV/0! or Unknown. You will have to replace these with something that makes sense for you, maybe NA or Inf and then convert to numeric.

data <- read.csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pilot_2019_04_09.csv'), 
                 header = TRUE)
dataFactors <- select_if(data, is.factor)
colnames(dataFactors)
#>  [1] "participant_id"    "redcap_event_name" "demo_dob"         
#>  [4] "cohabitation"      "Groups2"           "painduration"     
#>  [7] "Groups3"           "asq_9a"            "asq_11a"          
#> [10] "asq_14a"           "asq_15a"           "calibrationvisit"
levels(dataFactors$asq_9a)
#>  [1] ""        "#DIV/0!" "1"       "1.5"     "10"      "2"       "3"      
#>  [8] "3.5"     "4"       "5"       "6"       "8"
levels(dataFactors$asq_11a)
#> [1] ""        "#DIV/0!" "10"      "15"      "3"       "4"       "5"      
#> [8] "6"       "8"
levels(dataFactors$asq_14a)
#>  [1] ""        "10"      "12"      "15"      "20"      "3"       "4"      
#>  [8] "5"       "6"       "7"       "8"       "9"       "Unknown"
levels(dataFactors$asq_15a)
#>  [1] ""        "#DIV/0!" "1"       "10"      "12"      "15"      "2"      
#>  [8] "3"       "4"       "5"       "5.5"     "6"       "7"       "7.5"    
#> [15] "8"       "9"

Created on 2019-04-12 by the reprex package (v0.2.1)

2 Likes

#5

Thanks for the quick response. I will make the suggested changes with a text editor and try again. I really appreciate the help.

Cheers,
Jason

0 Likes

#6

@FJCC Thanks! Your diagnosis was spot on. My next task was trying to figure out how to remove the errant values in R, but using a text editor was definitely faster and easier.

Now I'm on to figuring out how to create and save the average asq_*a values as a new vector.

Thanks again.

Cheers,
Jason

#rstatsnewbie

0 Likes

#7

The read.csv() function has a handy argument called na.strings that will likely work well in this scenario (and it makes it so you don't have to change your dataset at all :slightly_smiling_face:).

You'll see in the documentation this argument is

a character vector of strings which are to be interpreted as NA values.

By default, blanks or "NA" are read as NA for numeric variables.

In your case, you could add na.strings = c("Unknown", "#DIV/0!") in read.csv() as you read the dataset in to treat these values as NA and get your columns to read correctly as numbers instead of characters.

2 Likes

#8

@aosmith Thanks! That is a great thing to know and will be very handy in the future!!!

Cheers,
Jason

0 Likes

#9

How about using read_csv to create a tibble, which basically sets stringsAsFactors=FALSE by default, and is faster.

0 Likes

#10

Sounds good. Would the code look like the following?

dat <- read.csv(as_tibble("pilotData.csv", header = TRUE))

Thanks for the suggestion and any other help.

Cheers,
Jason

#rstatsnewbie

0 Likes

#11

No. It'll be like this, provided you've loaded readr previously:

dat <- read_csv("pilotData.csv")

The analogue of header argument in read.csv in read_csv is col_names. For both the functions, they are TRUE by default, so you don't need to mention it explicitly (but if you do, that's fine too).

Also, the analogue of na.strings here is just na.

For more details, go through the documentations here and here.

1 Like

#12

@Yarnabrina Thanks for the information.
Would this then be correct:

library(readr)
dat <- read_csv("pilotData.csv", col_names = TRUE, na = "NA")

Does this command exist?

 skip_empty_cols = TRUE

Cheers,
Jason

0 Likes

#13

Greetings @Yarnabrina and everyone
Does read_csv do something different to the data than read.csv?
I ask because when I switched to the former, some parts of my code, creating a new variable, quit working. Below are the commands and error messages. I can upload a reprex if that would help.

Commands

#	CREATING THE ASQ-LIGHT VARIABLE 
data3 <- 
	mutate(data2,
		x = pmap_dbl(list(asq_1a, asq_2a, asq_3a, asq_4a, asq_5a,
						  asq_6a, asq_7a, asq_8a, asq_9a), function(...){
			row_values <- unlist(list(...))
			number_of_NAs <- sum(is.na(row_values))
			map_dbl(number_of_NAs, ~ case_when(
				.x == 0 ~ mean(row_values),
				.x >= 1 ~ mean(row_values, na.rm = TRUE) #,
				# .x == 1 ~ mean(row_values, na.rm = TRUE),
				# .x == 2 ~ mean(row_values, na.rm = TRUE),
				# .x == 3 ~ mean(row_values, na.rm = TRUE)
			))
		})
	) %>% 
	rename(asq_light = x )

Error messages

argument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: [... truncated]

I cut the error message short but it repeats seemingly, for every. subject.
I will keep looking for the difference between the two commands that can explain the problem but any help is greatly appreciated.

0 Likes

#14

Here is a reprex of the problem.

###############################
#       CREATED BY:     JASON CRAGGS
#       CREATED ON:     2019-04-18
#       USAGE:          REPREX TO READ CSV FILES
###############################
#
library(tidyverse)

#       LOAD DATA (SAME FILE BOTH TIMES)
data1 <- read.csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pain_pilot_2019_04_18.csv'), header = TRUE)
data2 <- read_csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pain_pilot_2019_04_18.csv'),
                  col_names = TRUE,
                  col_types = NULL,
                  quoted_na = FALSE)
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   ID = col_character(),
#>   redcap_event = col_character(),
#>   count_asqPain = col_character(),
#>   `Good-bad` = col_character(),
#>   demo_dob = col_date(format = ""),
#>   cohabitation = col_character(),
#>   prior_pain_about = col_logical(),
#>   painduration = col_character(),
#>   Groups = col_character(),
#>   asq_9a = col_character(),
#>   asq_11a = col_character(),
#>   asq_14a = col_character(),
#>   asq_15a = col_character(),
#>   typical_alc_use1 = col_character(),
#>   typical_alc_use2 = col_character(),
#>   calibrationvisit = col_character()
#> )
#> See spec(...) for full column specifications.


#       ADD ASQ-LIGHT VARIABLE
#               THIS VERSION DOES WORK
data1.1 <-
    mutate(data1,
           x = pmap_dbl(list(asq_1a, asq_2a, asq_3a, asq_4a, asq_5a,
                          asq_6a, asq_7a, asq_8a, asq_9a), function(...){
                            row_values <- unlist(list(...))
                            number_of_NAs <- sum(is.na(row_values))
                            map_dbl(number_of_NAs, ~ case_when(
                                .x == 0 ~ mean(row_values),
                                .x >= 1 ~ mean(row_values, na.rm = TRUE)
                            ))
                          })
    ) %>%
    rename(asq_light = x )

#               THIS VERSION DOES NOT WORK
data2.1 <-
    mutate(data2,
           x = pmap_dbl(list(asq_1a, asq_2a, asq_3a, asq_4a, asq_5a,
                          asq_6a, asq_7a, asq_8a, asq_9a), function(...){
                            row_values <- unlist(list(...))
                            number_of_NAs <- sum(is.na(row_values))
                            map_dbl(number_of_NAs, ~ case_when(
                                .x == 0 ~ mean(row_values),
                                .x >= 1 ~ mean(row_values, na.rm = TRUE)
                            ))
                          })
    ) %>%
    rename(asq_light = x )
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(row_values): argument is not numeric or logical:
#> returning NA
#> Warning in mean.default(row_values, na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#....[truncated by user]
# ```

Created on 2019-04-18 by the reprex package (v0.2.1)

0 Likes

#15

One main difference is that read.csv converts character vectors to factors by default, and it can be changed using stringsAsFactors argument. Also, the characters considered as missing values vary, as read_csv considers (quite correctly) "" as NA by default, along with "NA".

I haven't checked your reprex, but noted that adding stringsAsFactors = False for data1 generates same warnings. I don't know purrr (I'm still learning), and hence can't understand what you're trying to do. I hope others will answer your question.


On a separate note, please familiarise yourself with this post:

1 Like

#16

For some reason read_csv is reading some of those numeric columns as characters and that is why mean() returns an error, I don't know why this is happening but you can walkaround this problem by converting them to numeric afterwards.(Also, for this case, you can use rowwise operations instead of the complicated purrr syntax)

library(tidyverse)
data2 <- read_csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pain_pilot_2019_04_18.csv'),
                  col_names = TRUE,
                  col_types = NULL,
                  quoted_na = FALSE)

data2 %>%
    mutate_at(vars(starts_with("asq_")), as.numeric) %>%
    rowwise() %>% 
    mutate(asq_light = mean(c(asq_1a, asq_2a, asq_3a, asq_4a, asq_5a,
                              asq_6a, asq_7a, asq_8a, asq_9a), na.rm = TRUE)) %>%
    ungroup() %>% 
    select(asq_light, starts_with("asq_")) %>% 
    head(10)
#> Warning: NAs introducidos por coerción

#> Warning: NAs introducidos por coerción

#> Warning: NAs introducidos por coerción

#> Warning: NAs introducidos por coerción
#> # A tibble: 10 x 31
#>    asq_light asq_1 asq_1a asq_2 asq_2a asq_3 asq_3a asq_4 asq_4a asq_5
#>        <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
#>  1    NaN       NA     NA    NA     NA    NA     NA    NA     NA    NA
#>  2    NaN        0     NA     0     NA     0     NA     0     NA     0
#>  3      4.71     1      5     1      6     1      4     0     NA     0
#>  4      1.89     1      1     1      2     1      2     1      2     1
#>  5      2.78     1      2     1      3     1      3     1      2     1
#>  6    NaN       NA     NA    NA     NA    NA     NA    NA     NA    NA
#>  7    NaN        0     NA     0     NA     0     NA     0     NA     0
#>  8      1.67     1      1     1      2     1      2     1      1     1
#>  9      3.25     1      2     1      4     1      3     1      2     1
#> 10      2.83     1      3     1      3     1      3     1      1     0
#> # … with 21 more variables: asq_5a <dbl>, asq_6 <dbl>, asq_6a <dbl>,
#> #   asq_7 <dbl>, asq_7a <dbl>, asq_8 <dbl>, asq_8a <dbl>, asq_9 <dbl>,
#> #   asq_9a <dbl>, asq_10 <dbl>, asq_10a <dbl>, asq_11 <dbl>,
#> #   asq_11a <dbl>, asq_12 <dbl>, asq_12a <dbl>, asq_13 <dbl>,
#> #   asq_13a <dbl>, asq_14 <dbl>, asq_14a <dbl>, asq_15 <dbl>,
#> #   asq_15a <dbl>

Created on 2019-04-19 by the reprex package (v0.2.1.9000)

1 Like

#17

Thanks for the information and additional link.
As for my goal with creating the variable, I should have mentioned that! I am sure there is another/better way to accomplish my goals. This is especially true given that I don't understand what all the commands are doing. I just adapted something I found on here that seemed appropriate.
Goals

  1. Create a new variable called "asq-light" that is an average of each person's available asq_1a - asq_9a scores.
  2. Create a new variable called "asq_heavy" that is an average of each person's available asq_10a - asq_15a scores.

Once created, I will use these values in the correlation analyses I am trying to accomplish.

Thanks for all the help thus far.

0 Likes

#18

Thanks for the response and information about the error.
I am still learning how to convert columns to different data types (e.g., character to numeric, factor to numeric etc.).

Also, thanks for letting me know about the rowwise operations. I didn’t know they existed or even when to look for them, yet.

Regarding not knowing things, what is the ungroup() command doing? I didn’t see where you used something to group anything.

Cheers,
Jason

0 Likes

#19

After using rowwise() data gets grouped by rows so it's a good practice to ungroup it when you have done making rowwise operations, to avoid grouping related problems in the future.

1 Like

#20

That is very good to know. Also, why didn’t your code put the new asq_light variable at the end of the dataset, like my previous attempts?
Jason

0 Likes