what's the meaning of forming a table and forming a value?

The 1st code can give me a Data, while the 2nd code can give me a Value.

post.treat.mean   <- df %>% filter(post==1, treatment==1) %>%
  summarize(mean=mean(daily_kwh))

sd_tr <- sd(df$daily_kwh[df$post==1&df$treatment==1], na.rm = T)
```r

Why there is a difference?

Also, I'm a novice in r. Can anyone tell me what's the meaning of %>%? 
3rd question: ``` r and ```. What's the meaning of ``` r? Can I change it to ``` other letter?
4th question: after run nrow function, I can see the outcome in Values. It's a number 105246L. What does L mean?

It is actually a "data frame" and it is rectangular data (with columns and rows) whereas the other is a single scalar value.

This works for delimiting code chunks on markdown files, the r part is for specifying what coding language you are using, it could also be ```python for example.

The L part is for denoting that it is an "integer" value.

1 Like

Can you tell me what the meaning of %>%? I see a lot of commands including it.

Also, I can understand data frame and single scalar value. But what specific contents in the commands above specify the type?

This is called the "pipe" operator, it comes from the magrittr package (but the operator is also imported by the dplyr package), and its function is to pass the object on the left side as the first argument for the function on the right side, making command concatenation easier, for example, this two are equivalent.

df %>% filter(post==1, treatment==1) %>%
    summarize(mean=mean(daily_kwh))

# This is the equivalent with regular syntax
summarize(fiter(df, post==1, treatment==1), mean=mean(daily_kwh))

I don't understand this question, can you elaborate on this?

1 Like

The two codes give one dataset, and the other scalar value. How can I see which one is data set and which one is the scalar?

Sorry, I think I still can't understand what you mean.
If you want to know the class of any object you can use the class() function e.g.

class(iris)
[1] "data.frame"

If you mean how to know what class of object a particular function returns, then the only way is by reading the function's documentation.

post.treat.mean <- df %>% filter(post==1, treatment==1) %>%
summarize(mean=mean(daily_kwh))

sd_tr <- sd(df$daily_kwh[df$post==1&df$treatment==1], na.rm = T)

By comparing these two codes, the first one gives table, and the second one gives a scalar value. What makes this difference?

The purpose of each one. There is no simple answer for this, as I said, you would need to read the documentation for each function to find out, so a minimal understanding of the language is required.

Let me restate my question in a clearer way. Though post.treat.mean is a table (data set), but it is only a single number in the data set. In other words, there is only one element in this matrix.

sd_tr is also only a number / element, but it is a scalar.

You have made two statements, both are correct but there is not any question in your post, sorry it seems like we are facing some kind of language barrier here.

R is a dynamic programming language, with dynamic typing.
The language itself doesn't therefore guarantee what types of data are passed into or returned from function. The programmers and users have much greater freedom, at some cost of risk of misunderstanding/doing harm. Through familiarity with the functions you often use, you will remember / have an instinct for which types are required to go in as parameters, and which type you would expect to be returned to you from the function you call. The documentation provides some help (depending on the quality/content of said documentation)

i.e.

?summarize

Usage
summarise(.data, ...)
Arguments
.data A tbl.
Value
An object of the same class as .data. One grouping level will be dropped.


this implies, summarise function process tibbles/dataframes as inputs and returns them as outputs

contrast with

?sd

Usage
sd(x, na.rm = FALSE)
Arguments
x a numeric vector or an R object but not a factor coercible to numeric by as.double(x)

sd processes numeric vectors, and though the documentation doesnt specify that numeric vector is returned, it is a safe assumption.
sidenote:: you could easily write a 'wrapper' function, like 'mysd()' that calls sd for you and returns a tibble. A powerful thing about R is that you can make your own functions.

1 Like

Do you mean that summarize always gives you an object of the same class as.data, while sd just gives you a number?

There is a minimum post length of 20 words so I can't just say yes but
Yes

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.