Write to CSV, column of string values

archsteve · October 6, 2018, 5:18am

Update: The real problem I have is using cbind to create a matrix
salmonj<-cbind( Community_Name=Community_Name,Community_Code,header=TRUE,Strings) How do I preserve the Community_Name as a string in my new matrix?

Thank you!

Hi All, this is a newbie question. I easily can write my df to a .csv file. However, one of the elements in a df I am using has a string. It represents the name of a community.Is there a trick I am missing? When I run the code nothing happens, except the string is converted to a factor. Here is an example of some code I've tried.

write(mydateframe file = "Desktop/salmon/MyData.csv",
append = FALSE, sep = " ")

Thank you.
Steve

kmprioli · October 6, 2018, 1:10pm

Hi Steve, and welcome to community.rstudio.com! To help you get the right help for your question, can you please turn it into a reprex (reproducible example)? This will ensure we're all looking at the same data and code. A guide for creating a reprex can be found here.

How is your data organized? What are you seeing (errors and/or output)?

Below I've created some dummy data to illustrate binding these into a dataframe and exporting to .csv.

library(tidyverse)

# Creating dummy data

Community_Name <- c("Mars", "Saturn", "Mercury", "Neptune")
Community_Code <- c(4, 6, 1, 8)

# Binding into a dataframe

community_df <- as.data.frame(cbind(Community_Name, Community_Code), stringsAsFactors = FALSE)

# Inspecting the structure of the dataframe

str(community_df)
#> 'data.frame':    4 obs. of  2 variables:
#>  $ Community_Name: chr  "Mars" "Saturn" "Mercury" "Neptune"
#>  $ Community_Code: chr  "4" "6" "1" "8"

# Exporting to .csv

community_out <- write_csv(community_df, "community_out.csv")

Created on 2018-10-06 by the reprex package (v0.2.0).

I hope this helps!

jcblum · October 7, 2018, 1:11am

I’m not sure if you posted this update before or after @kmprioli’s reply, but just to be clear: a matrix in R can only contain data of a single type. The structure that can contain data of multiple types (e.g., text, factors, numeric, etc) is a data frame. See here for a table summarizing the basics of R’s data structures: Data structures · Advanced R.

Beyond that, I’m struggling to understand exactly what you’re doing, so I’m going to echo @kmprioli’s suggestion — a reproducible example would really help! At the very least, can you show more of the code you’re working on? It’s ok if it doesn’t work correctly!

archsteve · October 7, 2018, 10:19pm

Okay, I am trying to follow. Basically I am reading data from a .csv file. In that file there are a variety of data types (string and factor). When I read that .csv, the strings and factors seem to be preserved. When I try to cbind that with the as.data.frame command, the data is all converted to factor data. I am trying to create a link so I can let you look at it through my github. So far that is not working. I know, I am very new at all this. I appreciate the assistance. Cheers.

> library(tidyverse)
> 
> #tell where the data come from
> community_all<- read.csv("Desktop/salmon/community_data.csv", header=TRUE, dec=".")
> str(community_all)
'data.frame':	553 obs. of  7 variables:
 $ Project_ID    : int  19 67 62 85 171 180 124 205 82 201 ...
 $ Project_Name  : Factor w/ 143 levels "","Akhiok 1992",..: 73 74 75 2 46 76 3 42 8 18 ...
 $ RegionCode    : int  3 3 3 3 3 3 4 4 3 3 ...
 $ Region        : Factor w/ 6 levels "Arctic","Interior",..: 5 5 5 5 5 5 6 6 5 5 ...
 $ Community_Code: int  2 2 2 2 2 2 3 4 5 5 ...
 $ Community_Name: Factor w/ 255 levels "Akhiok","Akiachak",..: 1 1 1 1 1 1 2 3 4 4 ...
 $ Study_Year    : int  1982 1986 1989 1992 2003 2004 1998 2010 1990 2008 ...
>
> community_sub1<-as.data.frame(cbind(Community_Name,Community_Code, stringsAsFactors = FALSE))
> str(community_sub1)
'data.frame':	553 obs. of  3 variables:
 $ Community_Name  : int  1 1 1 1 1 1 2 3 4 4 ...
 $ Community_Code  : int  2 2 2 2 2 2 3 4 5 5 ...
 $ stringsAsFactors: int  0 0 0 0 0 0 0 0 0 0 ...

jcblum · October 8, 2018, 2:12am

So I'm guessing from your code that you're trying to subset the big community_all data frame? I see you've loaded the tidyverse packages at the start, but you aren't (yet) using any of their tools. Here's how you would subset community_all, tidyverse-style:

select(community_all, Community_Name, Community_Code)

The output will be a data frame with just those two columns in it. To get a handle on using select() and the other dplyr data-wrangling "verbs", start here: 5 Data transformation | R for Data Science. A major benefit of using these tools is that they abstract away a bunch of the complexity I'm about to go into below!

Why didn't the code you tried work?

cbind() tries to guess whether you want data frame or matrix output based on what you pass to it, and then it uses totally different code under the hood based on what it guessed. At least one argument has to be a data frame for it to use the data frame method (the only one where stringsAsFactors means anything). You might be a little surprised to hear that you didn't actually pass any data frames to cbind()!

I'm not totally sure what you did pass, because your code as written shouldn't work. Based on what you posted, referring to Community_Name and Community_Code should have caused an "object not found" error. I'm guessing that you did something earlier in your session that made these objects exist (or appear to exist) on their own. You could have created new vectors with those names, or you might have used attach(community_all).

A brief tour of R subsetting, and a warning about attach()...

# Working with the built-in `mtcars` dataset...
str(mtcars)
#> 'data.frame':	32 obs. of  11 variables:
#>  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp: num  160 160 108 258 360 ...
#>  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec: num  16.5 17 18.6 19.4 17 ...
#>  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
#>  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
#>  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

# Dollar-sign subsetting extracts a vector from a data frame
str(mtcars$cyl)
#>  num [1:32] 6 6 4 6 8 6 8 4 4 6 ...

# Double-bracket subsetting extracts a vector from a data frame
str(mtcars[["cyl"]])
#>  num [1:32] 6 6 4 6 8 6 8 4 4 6 ...

# Single bracket subsetting preserves the data frame structure
str(mtcars["cyl"])
#> 'data.frame':    32 obs. of  1 variable:
#>  $ cyl: num  6 6 4 6 8 6 8 4 4 6 ...

# Variables called on their own thanks to `attach()` act like
# dollar-sign or double-bracket subsetting (extracts a vector)
attach(mtcars)
str(cyl)
#>  num [1:32] 6 6 4 6 8 6 8 4 4 6 ...

^{Created on 2018-10-07 by the reprex package (v0.2.1)}

I advise you to avoid using attach(). It's a convenience function that lets you refer to vectors from a data frame without using the dataframe_name$ prefix. This seemed like a great idea once upon a time because it saved typing, so you see it a lot in R examples of a certain vintage. But it can cause all sorts of confusion and errors, because it effectively creates a bunch of invisible objects that you have to remember are there (they won't show up in your workspace). I strongly recommend avoiding it until you are sure you know what you're doing and understand the dangers (and by then, chances are you will have come up with your own reasons to avoid attach()).

One way or another, you managed to pass individual vectors to cbind(), so it used the matrix method. The matrix method doesn't know what to do with stringsAsFactors, so it assumed that was just another vector that you were passing in, one that only has FALSE values.

Like I mentioned above, matrices can only have a single type of data, so R had to convert everything in the matrix into one type. Per the documentation, cbind() does this as follows:

The type of a matrix result determined from the highest type of any of the inputs in the hierarchy raw < logical < integer < double < complex < character < list .

So you seem to have passed cbind() a vector of factor data, a vector of integer data, and a vector of logical data. Factors are integers under the hood, so as a result of the above hierarchy, you got a matrix of integers (FALSE converts to integer 0). Subsequently converting the matrix to a data frame can't reverse that operation, so you just wind up with a data frame of integers.

Whew! If you're still with me, one more point... While what you tried is creative, you might have already started to realize that using cbind() was sort of going out of your way if all you wanted to do was subset your data frame. Here's an example of how this is more typically done in base R: Getting a subset of a data structure

archsteve · October 8, 2018, 2:36am

Ah, yes, this is helpful. Thank you for your thorough information! I think I am following.