unite 5 tables into 1 table

Hi, I'm new to all this and I find it fascinating!!!

I need help to understand something I want to do but I don't know how...

For sure is a very easy thing...

I have 12 CVS files, monthly sales, each one is a month and I want to make a new table called year_2020. In that table, I want to put all the months together.

I don't know how to make a join but they don't share a second key. I don't want to join the information I want all the information in one table if that's passible?

I would guess you want to join them row-wise. In the following example, I stack three data frames with the rbind() function. Does that work for you?

#Invent some data
DF1 <- data.frame(Name = c("A","B","C"), Value = 1:3)
DF2 <- data.frame(Name = c("D","F","H"), Value = 7:9)
DF3 <- data.frame(Name = c("A","F","B"), Value = 11:13)

#Bind the three data frames
AllDat <- rbind(DF1,DF2,DF3)
AllDat
  Name Value
1    A     1
2    B     2
3    C     3
4    D     7
5    F     8
6    H     9
7    A    11
8    F    12
9    B    13
1 Like

Thx,

I solve it like this:

year_2020 <- full_join(Apr_2020, May_2020) %>%
full_join(Jun_2020) %>%
full_join(Jul_2020) %>%
full_join(Ago_2020) %>%
full_join(Sep_2020) %>%
full_join(Oct_2020) %>%
full_join(Nov_2020) %>%
full_join(Dic_2020)

I Was trying something like this:

year_2020 <- full_join(Apr_2020, May_2020, bla , bla...)

But it can't be done like that... Thks very much

Now I understand what you wrote, that's exactly what I wanted to do... Thx

Now I have the problem that one of the data types is in character with NA and I need it in double, but it doesn't let me change it because of the NA.

Then I tried to change tho NA for 0's but it doesn't let me because it is a character type...

I'm looking on the internet for what I can do...

Having an NA in a character column should not prevent you from converting the other values to be numeric. For example,

DF <- data.frame(Value = c("1.2", "215.45", NA, "-57.91"))
DF$Value <- as.numeric(DF$Value)
summary(DF)
     Value       
 Min.   :-57.91  
 1st Qu.:-28.36  
 Median :  1.20  
 Mean   : 52.91  
 3rd Qu.:108.33  
 Max.   :215.45  
 NA's   :1       

If you post a little of the data, someone here can probably find the problem. Assuming that the data frame is called DF, you can post the output of

dput(head(DF))

Put a line with three back ticks just before and after the posted output, like this
```
Output of dput() goes here
```

1 Like

Thxs

Actually, today I tried with the "rbind()" and It didn't throw me a mistake...

Then I call View() and everything seems in place. I'm gonna keep cleaning it and see what happens in the future. Thx very very much for your help...

Hi, I don't know if you can still help me.

I have accomplished to put all the months into years, then I tried to make one data with all the info but I think is too much information because my Rstudio crashed when I did the rbind () for the three years.

So I decided to take some columns away and just keep the ones that I need. So I tried this:

year_2021_Sm <- year_2021 %>% 
  select(rideable_type, started_at, ended_at, start_station_name, end_station_name, member_casual) %>% 
  View()
```
But It doesn't create new data it creates a Value with Null in the environment pane...

I have been looking around for how to do it, I tried subset, but it tells me that cannot find the functionyear_2020, which is the data that I'm trying to make smaller. 

Also tried data.frame but that one went really bad... hehe

Can you recommend to me how to make new data but with fewer columns?

Do not put View() at the end of your chain of functions. That give year_2021_Sm the value returned by View(), which seems to be NULL. That is, run

year_2021_Sm <- year_2021 %>% 
  select(rideable_type, started_at, ended_at, start_station_name, end_station_name, member_casual)

If you take any data frame, say DF, and do this

NewDF <- DF |> View()

you will see that NewDF is NULL

1 Like

Ok. Thx

I won't do it any more

Now Im having trouble to make operations between time...

I have a start time and end time.

I'm coding:

full_data$Trip_Time <- (full_data$started_at - full_data$ended_at) 

The return is in seconds. Do you know how I can format it to be H:M?

You can use the hms package to display seconds as H:M:S. The underlying value remains in seconds. For example:

library(hms)
as_hms(3780)
01:03:00
as_hms(96300)
26:45:00
1 Like

thx you very very much!!!!

Hi, me again... I hope you can help me understand what I'm doing wrong now...

This is the str() of the df:
tibble [9,584,529 × 13] (S3: tbl_df/tbl/data.frame)
rideable_type : chr [1:9584529] "docked_bike" "docked_bike" "docked_bike" "docked_bike" ... start_station_name : chr [1:9584529] "Eckhart Park" "Drake Ave & Fullerton Ave" "McClurg Ct & Erie St" "California Ave & Division St" ...
end_station_name : chr [1:9584529] "Lincoln Ave & Diversey Pkwy" "Kosciuszko Park" "Indiana Ave & Roosevelt Rd" "Wood St & Augusta Blvd" ... member_casual : chr [1:9584529] "member" "member" "member" "member" ...
Start_Day : chr [1:9584529] "26" "17" "01" "07" ... End_Day : chr [1:9584529] "26" "17" "01" "07" ...
Start_DayOfWeek : chr [1:9584529] "Sun" "Fri" "Wed" "Tue" ... End_DayOfWeek : chr [1:9584529] "Sun" "Fri" "Wed" "Tue" ...
Start_Day_MonthName: chr [1:9584529] "Apr" "Apr" "Apr" "Apr" ... Start_Day_Year : chr [1:9584529] "20" "20" "20" "20" ...
Start_Day_Time : chr [1:9584529] "17:45" "17:08" "17:54" "12:50" ... End_Day_Time : chr [1:9584529] "18:12" "17:17" "18:08" "13:02" ...
$ Trip_Time : 'hms' num [1:9584529] 00:26:49 00:08:09 00:14:23 00:12:12 ...
..- attr(*, "units")= chr "secs"

im trying to

group_by(fd$Start_Day_Year)

And this is the answer that it returns:

group_by(fd$Start_Day_Year)
Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "character"

> str(fd)
tibble [9,584,529 × 13] (S3: tbl_df/tbl/data.frame)
 $ rideable_type      : chr [1:9584529] "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
 $ start_station_name : chr [1:9584529] "Eckhart Park" "Drake Ave & Fullerton Ave" "McClurg Ct & Erie St" "California Ave & Division St" ...
 $ end_station_name   : chr [1:9584529] "Lincoln Ave & Diversey Pkwy" "Kosciuszko Park" "Indiana Ave & Roosevelt Rd" "Wood St & Augusta Blvd" ...
 $ member_casual      : chr [1:9584529] "member" "member" "member" "member" ...
 $ Start_Day          : chr [1:9584529] "26" "17" "01" "07" ...
 $ End_Day            : chr [1:9584529] "26" "17" "01" "07" ...
 $ Start_DayOfWeek    : chr [1:9584529] "Sun" "Fri" "Wed" "Tue" ...
 $ End_DayOfWeek      : chr [1:9584529] "Sun" "Fri" "Wed" "Tue" ...
 $ Start_Day_MonthName: chr [1:9584529] "Apr" "Apr" "Apr" "Apr" ...
 $ Start_Day_Year     : chr [1:9584529] "20" "20" "20" "20" ...
 $ Start_Day_Time     : chr [1:9584529] "17:45" "17:08" "17:54" "12:50" ...
 $ End_Day_Time       : chr [1:9584529] "18:12" "17:17" "18:08" "13:02" ...
 $ Trip_Time          : 'hms' num [1:9584529] 00:26:49 00:08:09 00:14:23 00:12:12 ...
  ..- attr(*, "units")= chr "secs"

Change the dollar sign for a comma

$ ,

1 Like

So the syntax of the function is what I'm doing wrong

Thax :pray: :upside_down_face:

> fd %>% 
+   filter(member_casual == 'member') %>% 
+   as_hms(sum(Trip_Time))
Error in `df_cast_opts()`:
! Data frame must have names.
ℹ In file type-data-frame.c at line 683.
ℹ Install the winch package to get additional debugging info the next time you get this error.
ℹ This is an internal error in the rlang package, please report it to the package authors.
Backtrace:
    ▆
 1. ├─fd %>% filter(member_casual == "member") %>% ...
 2. ├─hms::as_hms(., sum(Trip_Time))
 3. ├─hms:::as_hms.default(., sum(Trip_Time))
 4. │ └─vctrs::vec_cast(x, new_hms())
 5. │   └─vctrs `<fn>`()
 6. │     └─vctrs::vec_default_cast(...)
 7. │       └─vctrs:::df_cast_opts(...)
 8. └─rlang:::stop_internal_c_lib(...)
 9.   └─rlang::abort(message, call = call, .internal = TRUE)
Error:
! Arguments in `...` must be used.
✖ Problematic argument:
• ..1 = sum(Trip_Time)
Run `rlang::last_error()` to see where the error occurred.

Do you or anyone knows what does this means???

Eudcated guess...

fd %>%
filter(member_casual == 'member') %>% 
mutate(newvar=  as_hms(sum(Trip_Time)))
1 Like

Very well educated guess!!!

You are the Master!!!

Thx very very much!!!

Amazing community!!!!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.