It's harder to answer that question in general than knowing what your data looks like. So I'll cheat as I've read your other question.

Let's start with 2.

`apply()`

takes a data frame (or similar), and applies an operation on its rows or columns. Here we use `apply(..., 2, ...)`

so we apply the function on its columns. For example:

```
X <- data.frame(x1 = 1:3,
x2 = 4:6)
X
#> x1 x2
#> 1 1 4
#> 2 2 5
#> 3 3 6
apply(X, 2, min)
#> x1 x2
#> 1 4
apply(X, 2, max)
#> x1 x2
#> 3 6
```

^{Created on 2023-07-12 with reprex v2.0.2}

So an `apply()`

is a way to make a loop. In other words, `apply(X, 2, max)`

means "take X, and for each column of X take the max".

Here we have:

```
apply(sdat[,-1], 2, e.function, seq=sdat[, 1])
```

That can be translated in "Take `sdat[,-1]`

, and for each column of `sdat[,-1]`

take the function `e.function()`

". But, as we'll see in a second, `e.function()`

requires two parameters, `x`

and `seq`

. So, `x`

will be each column of `sdat[,-1]`

, but we also need to provide `seq`

. We can give it as the 4th argument: `seq = sdat[,1]`

, that means the first column of `sdat`

, which is `Sequence`

.

So, what this does is, for each column of `sdat`

except the first, pass that column as `x`

and the first column as `seq`

and apply `e.function()`

.

Now let's go to 1. and the definition of `e.function()`

. I should say `tapply()`

can be used in many ways, and can be very confusing. Here, we have a single case where both of its inputs are a vector (a single column of `sdat`

).

`tapply()`

takes argument `X`

, a data vector, and `INDEX`

, a grouping factor. It uses the grouping factor to "split" the data, and applies a function to each of the groups:

```
x <- 1:7
fac <- list(c("a","a","a","a","b","b","b"))
tapply(x, fac, min)
#> a b
#> 1 5
tapply(x, fac, max)
#> a b
#> 4 7
```

Finally, let's put it back together:

```
e.function <- function(x, seq) tapply(x, seq, median)
temp <- apply(sdat[,-1], 2, e.function, seq=sdat[, 1])
```

What this does is take `sdat`

, and separate the first column which has protein sequences from the other columns which contain data. Then, for each data column, it takes the median by peptide.