Combining CSV files column names error

MTKEnter · December 7, 2022, 5:32pm

I am trying to combine two csv files. I loaded in CSV1 as weekly export and want to add it to the yearly combined much larger CSVC file.

CSV1 <- read.csv(path, check.names = FALSE, row.names = NULL)
CSVC <- read.csv(path, check.names = FALSE, row.names = NULL)

I then try to use rbind to combine the files

CSVF <- rbind(CSVC, CSV1) but an error comes back saying that number of columns and arguments do not match

When I use colnames(CSVC) to see listed columns it is showing NA column at the end but I am not sure if this is causing the issue or not.

Any help would be greatly appreciated.

scottyd22 · December 7, 2022, 5:47pm

Try using dplyr::bind_rows(CSVC, CSV1)

MTKEnter · December 7, 2022, 5:53pm

I should have mentioned I am trying to take the data in CSV1 and combine it to the bottom of CSVC so I don't believe bind_rows will work as it merges right to left so that is why I was trying to use rbind.

scottyd22 · December 7, 2022, 6:28pm

As shown in the examples below, bind_rows() adds the second data frame listed to the bottom of the first.

library(dplyr)

df1 = mtcars[1:5, 1:6] %>% mutate(data = 'df1')
df2 = mtcars[6:10, 1:4] %>% mutate(data = 'df2')

bind_rows(df1, df2)
#>                    mpg cyl  disp  hp drat    wt data
#> Mazda RX4         21.0   6 160.0 110 3.90 2.620  df1
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875  df1
#> Datsun 710        22.8   4 108.0  93 3.85 2.320  df1
#> Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215  df1
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440  df1
#> Valiant           18.1   6 225.0 105   NA    NA  df2
#> Duster 360        14.3   8 360.0 245   NA    NA  df2
#> Merc 240D         24.4   4 146.7  62   NA    NA  df2
#> Merc 230          22.8   4 140.8  95   NA    NA  df2
#> Merc 280          19.2   6 167.6 123   NA    NA  df2

bind_rows(df2, df1)
#>                    mpg cyl  disp  hp data drat    wt
#> Valiant           18.1   6 225.0 105  df2   NA    NA
#> Duster 360        14.3   8 360.0 245  df2   NA    NA
#> Merc 240D         24.4   4 146.7  62  df2   NA    NA
#> Merc 230          22.8   4 140.8  95  df2   NA    NA
#> Merc 280          19.2   6 167.6 123  df2   NA    NA
#> Mazda RX4         21.0   6 160.0 110  df1 3.90 2.620
#> Mazda RX4 Wag     21.0   6 160.0 110  df1 3.90 2.875
#> Datsun 710        22.8   4 108.0  93  df1 3.85 2.320
#> Hornet 4 Drive    21.4   6 258.0 110  df1 3.08 3.215
#> Hornet Sportabout 18.7   8 360.0 175  df1 3.15 3.440

Created on 2022-12-07 with reprex v2.0.2.9000

MTKEnter · December 7, 2022, 7:21pm

I am not sure what I was doing the first go around but this worked in terms of combining the file. However, the last column which is "Profit - %" is showing NA instead of the number. Do I need to set the column type to numeric when I use read.csv for it to show the actual number instead of NA logical?

scottyd22 · December 7, 2022, 7:59pm

Is "Profit - %" in both CSVC and CSV1? If possible, can you please share what you get with the two commands below? This will provide the first 5 rows of each data set.

dput(head(CSV1, 5)
dput(head(CSVC, 5)

MTKEnter · December 7, 2022, 8:38pm

Yes it is. There are 22 columns with over 1M rows. When I read in the csv files with read.csv it reads them in with 23 variables adding a row counter column in the first column and then it turns the Profit - % column into NA Logical instead of numeric. Is there away to read in the files and not have the counter row added and the profit % to read as numeric?

I tried to post the dput(head(CVS1, 5) but it got removed..

MTKEnter · December 7, 2022, 10:06pm

figured it out, all I needed was names(CSV1) = names(CSV1)[-1] to drop the row counter default row and then rbind did the trick. Thanks.

system · December 28, 2022, 10:07pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.