I have a very large dataset. I imported many .csvs , used rbind, and then split them into blocks of equal length on the first column. It now looks like this:
Each of these is a tibble with 139 rows. The name of the tibble was created by the split function and corresponds to that tibble's info. I want to ggplot column x=YEAR and y=RO_MM for each tibble as if it were a single row of data, i.e.: 1961-2099, so that I can compare that row against the others.
Is this possible? Or do I need to save each tibble in this new arrangement as a .csv and reimport/rbind them?
I guess it wasn't clear that I had already used rbind to combine all the .csvs of data and then used split to parse them into unique chunks/blocks based on their names in the first column. As such they are technically one dataframe, but are now read in unique blocks as per the data I showed above. Your answer uses two different dataframes.
Error in colnames<-(*tmp*, value = c("HUCs", "YEAR", "RO_MM")) :
attempt to set 'colnames' on an object with less than two dimensions
So apparently my initial code (above) is bringing in the csvs and making them into a list and I need it to make them into a dataframe. Is there a way to do this?
Are you trying to bind the list of dataframes with identical fields to a single dataframe?
You could use do.call(rbind, ) or data.table::rbindlist(.).
# setup
df.list <- list(df1 = data.frame(A = LETTERS[1:2], B = rnorm(2) ),
df2 = data.frame(A = LETTERS[3:5], B = rnorm(3) ),
df3 = data.frame(A = LETTERS[6:9], B = rnorm(4) )
)
# using do.call()
df.stacked <- do.call(rbind,df.list ) # the rownames are concatenations of list names
# and row names of the df1, df2, df3
# using data.table::rbindlist
# data.table::rbindlist(df.list)
In case you need to add the list names to the resulting dataframe, you can do it this way
# With list names added as a column
# With list names added as a column
df.named <- lapply(seq_along(df.list) ,
function(x){ df.l <- df.list[x] # list of one element
df.x <- df.l[[1]] # data frame inside
df.x$N <- names(df.l) # add the name of df.1 as a column
df.x # return updated data frame
}) %>% do.call(rbind,. )
The output
> # results
> df.list
$df1
A B
1 A -1.1183335
2 B 0.7666511
$df2
A B
1 C 1.4611863
2 D -1.2458959
3 E -0.8553016
$df3
A B
1 F 0.2747312
2 G 0.5511697
3 H -0.4734671
4 I -0.8334593
> df.stacked
A B
df1.1 A -1.1183335
df1.2 B 0.7666511
df2.1 C 1.4611863
df2.2 D -1.2458959
df2.3 E -0.8553016
df3.1 F 0.2747312
df3.2 G 0.5511697
df3.3 H -0.4734671
df3.4 I -0.8334593
> df.named
A B N
1 A -1.1183335 df1
2 B 0.7666511 df1
3 C 1.4611863 df2
4 D -1.2458959 df2
5 E -0.8553016 df2
6 F 0.2747312 df3
7 G 0.5511697 df3
8 H -0.4734671 df3
9 I -0.8334593 df3
>
Thank you, but I have been able to do that part. What I wanted to know was if I could take --from your results, as an example-- the df.stacked data, sort it, for example, by a grouping of , like, df1.1, df2.1, df3.1, df1.2, df2.2, df2.3, df3.1, df3.2, df3.3 with names, for instance 1:3, AND THEN use ggplot to call 1:3 as lines to graph. I think maybe it can't be done in a simple way. I am going to export into new csvs the restructured files that I have created and then plot them. Thanks all.