Create a list of column names

rstudio

#1

Hoping I can get some help here. I'm teaching myself R with some background in vbScript & Powershell. I'm trying to read in a csv file, pull the column names, massage them so that they match pre-defined requirements and then recreate the csv file with the new column names. I'm not looking for someone to write the script but the point I'm struggling with is when I create a dataframe or even and empty variable it seems as though there are characters in it. what I'm getting currently is [1] "" and when I write the variable to the csv I get that added to the file which screws up everything.

Here is the code that I'm working with and like I said I'm not looking for someone to write the script, just an explanation as to how to create an empty list, dataframe or variable that I can use.

rm(tmpcolnames)
partcolnamesDF <- list(read.csv("/home/david/Dropbox/TUN/new_participant_colnames.csv",header=FALSE, sep = ",",stringsAsFactors = FALSE))
tmpcolnames <- ""

#as.character(tmpcolnames[0])
#nchar(as.character(tmpcolnames[0]))
#newrow <- seq(39)
#r <- 0
#print(partcolnamesDF)
#print(tmpcolnames)
for (colname in partcolnamesDF)
{
  if(r < 1){
    tmpcolnames <- paste(tmpcolnames, tolower(colname), sep="")
  }
  else {
    tmpcolnames <- paste(tmpcolnames, tolower(colname), sep=",")
  }
  r <- r +1

}

print(tmpcolnames)

#  write.table(partcolnames, file = "/home/david/Dropbox/TUN/new_participant_lcolnames.txt", row.names=FALSE, sep=",")
write.csv(tmpcolnames, file = "/home/david/Dropbox/TUN/new_participant_lcolnames.txt")
#str(partcolnames)
#print(tmpcolnames)

```R
This is the output to the csv file

"","x"
"1",",security_category_name"
"2",",fiscal_year"
"3",",internal_event_name"
"4",",event_date"
"5",",participation_type_name"
"6",",team_name"
"7",",team_creation_date"
"8",",team_division"
"9",",team_id"
"10",",contact_id"
"11",",member_id"
"12",",participant_accept_email"
"13",",registration_date"
"14",",registration_active_status"
"15",",is_team_captain"
"16",",is_secondary_registration"
"17",",is_prior_participant"
"18",",emails_sent"
"19",",total_of_all_confirmed_gifts($)"
"20",",total_from_participant($)"
"21",",total_not_from_participant($)"
"22",",number_from_participant"
"23",",number_not_from_participant"
"24",",participant_email_status"
"25",",participant_employer"
"26",",participant_occupation"
"27",",participant_connection_to_ms"
"28",",address_participant_state/province"
"29",",address_participant_county"
"30",",address_participant_city"
"31",",address_participant_zip/postal_code"
"32",",registration_type"
"33",",event_id"
"34",",participant_gender"
"35",",participant_goal($)"
"36",",suggested_participant_goal($)"
"37",",source_code_type"
"38",",source_code_text"
"39",",sub_source_code_text"

#2

Hi,
May be you should use tmpcolnames = NULL, instead of assigning "". This will create a empty variable.
Thank you!


#3

Hi! Welcome!

You seem to be doing a very common thing, which is to approach a problem in a new language the way you would solve it in the language you already know. This is an understandable strategy, but it can lead you to do things in an unnecessarily roundabout fashion.

Here's a typical way to approach this task in R:

# This line is only necessary to set up the example CSV!
write.csv(head(iris), "iris.csv", row.names = FALSE)

# Read in a CSV
iris_data <- read.csv("iris.csv", header = TRUE, stringsAsFactors = FALSE)

iris_data
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

# Perform manipulations on column names. Here, we'll convert 
# column names to lowercase and replace periods with underscores
names(iris_data) <- gsub("\\.", "_", tolower(names(iris_data)))

# Write a CSV with updated column names
write.csv(iris_data, "iris2.csv", row.names = FALSE)

Created on 2018-08-13 by the reprex package (v0.2.0).

Here's what iris2.csv looks like:

"sepal_length","sepal_width","petal_length","petal_width","species"
5.1,3.5,1.4,0.2,"setosa"
4.9,3,1.4,0.2,"setosa"
4.7,3.2,1.3,0.2,"setosa"
4.6,3.1,1.5,0.2,"setosa"
5,3.6,1.4,0.2,"setosa"
5.4,3.9,1.7,0.4,"setosa"

I'm afraid I don't follow this part. I don't see any zero-length strings getting added to your CSV. Instead, your CSV output has row names, but that's just because write.csv() creates row names by default unless you set the row.names parameter to FALSE.