How do you speak R?

matt · July 20, 2020, 10:16pm

What

How do you write and verbalise R expressions? What are your thoughts on constructing English-sentences from R expressions?

Maybe variable <- x %>% filter(y != z) translates to 'variable gets x then filter where y doesn't equal z'? Maybe you'd reference 'open paren', or 'open bracket', or similar? What else?

(I'm more concerned with 'translating' R expressions to sentences rather than how to pronounce package names, like here and here.)

Why

The work-in-progress package {r2eng} (GitHub, tweet) aims to make (opinionated) translations from R to English (and vocalise them) and is inspired by @AmeliaMN's recent useR! 2020 talk. It uses {lintr} to parse tokens from R expressions.

The point is to help communicate R code between learners and teachers, to help non-English speakers and to generally improve the accessibility of R in written and spoken form.

Edit: in case it wasn't clear, I'm the creator of {r2eng}, looking for input, advice, etc.

mikeR · July 20, 2020, 10:35pm

tbo I read "filter x for all y unequal to z and assign to variable"

mikeR · July 20, 2020, 10:41pm

Even though you would read from left to right in english, in a programming language it's about precedence, <- is evaluated least compared to +, - or /.
With regard to that link I'd suggest that you do not read the pipe as a "then" if it's the first pipe.
All following pipes might be translated to "then"

matt · July 20, 2020, 10:45pm

Ah, interesting, thank you. What would you say is happening in your mind when you see an R expression? Are you looking ahead to look for the precedence? Does that get tricky when the expression is long, or composed of multiple piped expressions?

mikeR · July 20, 2020, 11:12pm

I just saw that you're the creator of the package, congrats on that! Looks like a lot of fun and a hard task.
How to read code depends on how it is written, how much you know about what you're seeing and of course the intention about why you're looking at it.

An example why the whole thing is really difficult

lapply <- function (X, FUN, ...) {
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X))
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}

This is just a small block of code, you can call it with lapply. My intension now is to see what lapply does. At first, I don't read "assign a function to lapply", maybe I do, but I don't recognize it. I skip the "{" because it's always there, no need to read it. and "FUN <- match.fun(FUN)" is read as one expression "match FUN". Then I read "check if it's not a vector or an object" || and | is so close it does not matter when reading. then again a block of "convert it to list and call C". But this is completely different to teaching someone a code block. Then I would start with "I assign a function to lapply with the arguments ..." etc...

If you compare two code blocks:

x <- 1
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)
x <- add(x, 1)

and

x <- 1
x <- x %>% 
       add(1) %>% 
       add(1) %>% 
       add(1) %>% 
       add(1) %>% 
       add(1) %>% 
       add(1) %>% 
       add(1) %>% 
       add(1) %>% 
       add(1) %>% 
       add(1)

I read the same thing, I read it as one expression. Now it gets really complicated for you because when "reading" the expression you can only go line by line, expression after expression. When reading line by line you are forced to assign x always, but that happens so fast in my brain, that I don't read it because the pattern is everywhere.
Really complicated stuff!

matt · July 20, 2020, 11:38pm

Thank you @mikeR, this is really helpful and exactly the kind of insight I was looking for.

I do wonder if the task is simply too complex for a simple token-to-text conversion, even with some 'looking ahead' to rearrange the output into something more appropriate than a direct left-to-right translation. But perhaps there's some commonalities in people's thinking that could be made into relatively simple rules, like orders of precedence and locating the 'important' parts of an expression. Easier said than done, of course

(Otherwise we might need some kind of GPT3-powered English/R translator, a bit like this!)

mikeR · July 20, 2020, 11:53pm

haha, the GPT-3 translator looks dope! But looks like yet another dumb AI.. It doesn't know context, it doesn't know what a circle is, if you don't teach it.
I think it could work if you focus only on Base and common infix operators, special symbols and such things. But as I said earlier, precedence is highly important and that is the selling point! That's the difference to a screen reader!

jtr13 · July 22, 2020, 12:50am

Amelia McNamara's talk, "Speaking R" at useR! 2020 last week is very relevant to this discussion:

olibravo · July 22, 2020, 6:41am

Hi, great challange I guess. The task may much more complex than simply extracting tokens and so on. That's because when You are experienced R programmer You don't read code literally, but rather You catch the whole context around it. The example with lapply above illustrates this well. The other thing is code can be written really bad and it can be very hard to read even for R heros.
But there is a hope If code is written well You won't find very long and complex expressions, because experienced R programmers divide it into small parts much easier to read and understand. And often You can find very similar constructs, so maybe you can approach your task this way - analyse bigger parts with similar architecture?

matt · July 22, 2020, 6:40pm

Thank you, for sure; it was the genesis for {r2eng}. Amelia's talk does a great job of explaining the complexities, as do @mikeR and @olibravo in this thread.

I think the goal of {r2eng} is really to do the left-to-right, 'typing with your tongue', approach that Amelia mentioned, where the value is in translating != to 'not equal to' rather than 'exclamation point equals' as a screen reader might do.

matt · July 22, 2020, 6:49pm

Thank you. Given this complexity, I think it makes sense for {r2eng} to focus on the simpler end of the spectrum: the left-to-right symbol-by-symbol translation of simple expressions. This is probably most beneficial when you literally want an R expression aloud, as a beginner or learner, or want something vocalised with a little more specificity to R code than a screen reader might manage.

mikeR · July 22, 2020, 7:03pm

Suggestion:
As soon as the package is mature enough, you could add an Addin to RStudio with a keyboard shortcut such that you can listen to a selected code block!

matt · July 22, 2020, 11:16pm

Great idea @mikeR, thanks. So great that I went ahead and did it in a spare 5 mins! You can read about it in the README here: https://github.com/matt-dray/r2eng#rstudio-addin

jtr13 · July 23, 2020, 2:26pm

Aha. Somehow I missed the beginning of the conversation. Awesome idea for a package and great progress.

AmeliaMN · July 24, 2020, 4:16pm

Yes, I think there is more thinking to be done about the different "levels" of reading code out loud. There is "typing with your tongue," there is a sort of overview way of speaking things (maybe dropping brackets, for example) and then a super-high-level "what's the big idea" kind of reading.

There's also the idea of audience. The way you "read" code is different between the way you read it when you are teaching sighted college students who are new to coding, reading silently inside your own head, how blind people read code (Andreas Stefik had a great keynote the same week as my useR! keynote where he talked about the idea of "skimming" code for the blind and visually impaired), how people whose first human language isn't English read English-based programming languages, etc.

I agree that automated reading out loud in the "typing with your tongue" style is the easiest to implement, although it probably only works in error-free code. I'll reproduce this tweet from Neil Brown because I think it's so relevant:

Not to mention context-sensitive syntax. e.g. Java 1<2 should be pronounced less-than by screen reader. But List<String>; is it "less-than String greater-than" or a custom pronunciation? Problem with latter is that a mismatch like List<String makes it "less-than" again

It might be useful to develop a list of places where this kind of mispronunciation could happen in R. One example from my talk was the vbar, |, which can mean "or" or "given" depending on context.

matt · July 26, 2020, 9:54am

Thank you very much, Amelia.

You're right, there's definitely 'levels' to consider. There's a lot of complexity in user personas and human language, for sure. This emphasises our need to make clear the purpose and limitations of {r2eng}.

Simpler is probably better for now, but some good contextual awareness is necessary. Along with Neil's context-sensitivity example, the {lintr} parser in {r2eng} also 'misses' tokens like !! (read as 'not not'). These examples definitely require better logic than the simple one-to-one mapping of tokens to English.

Andreas's talk was also interesting, thank you. His screen reader's lack of code awareness made clear the accessibility problem. He also made some interesting points about navigating by abstract syntax trees and semantic prioritisation. {r2eng} shouldn't place itself as an 'answer' to this, but may be useful for some users.

Bottom line: more user testing!

system · August 16, 2020, 9:58am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.