BoF at rstudio::conf - Natural Language Processing



Natural Language Processing / NLP

Keywords: Natural Language Processing, Natural Language Processing in R, NLP, Text analysis
Hosted by @julia
When and where: Thurs 10:30-11AM in the BoF Lounge 2


If you'd like to get notifications about this group, be sure that you are "Watching" this topic-thread.

If you would like to focus on a specific topic within this category, or ensure you are connecting with the right folks, reply below, discuss, and share widely!

What is a Birds of a Feather Session? Learn more at the BoF Directory


I'm excited about hosting this BoF session at rstudio::conf! :tada: Who thinks they might come, and what kinds of topics would be fun to chat about in a casual, face-to-face setting?


Hey :slight_smile:

I'd be interesting in hearing if anyone's using deep learning for NLP (you guys at SO certainly must be ;-))?

Especially in light of the latest advances in DL for NLP due to transfer learning (BERT, ULMFit, ELMO...)


Yep, we are. I just finished running a concept extraction model (BiLSTM-CRF network) this morning to see how this baseline model would handle part of our data. Not too bad for the first try via transfer learning.


Cool! Extracting embeddings from ELMO, then building your own classifier on top, as (probably) done here:


Precisely. Im using the approach just presented at the NeuroIPS 2018 conf described here:


Our application is related to free-form medical notes, which is notoriously difficult (and less accurate) than the usual NLP dataset. To make things even MORE challenging, our domain is veterinary medicine. So we have text that may look like “ the boy just ain’t right”, with no ICD-10 codes for outcomes. :confused:


We've been working with some ULMFit models internally, which has been really interesting and fun. I'm looking forward to getting to chat in person at rstudio::conf with some of you about the kinds of work you are doing!


Interesting, Julia. I'm trying to get a feel for the relative merits of ULMFiT compared to ELMo. It's interesting that ELMo seems to benefit from training on a domain-specific data set....even though it is supposed to be more general than word embeddings. Have you found the same or similar observations with ULMFiT?


Hi Julia!
ULMFiT looks promising for a problem that I'm working on (essentially a text classification problem with quite small data sets). I'd love to pick your brain about it.


Oh my goodness, @jbratt :scream::wave: What a surprise!

(We worked together at a previous job.)

@DPaschall I haven't tried ELMo on the same datasets that we are using with ULMFit so I don't know if I can speak to a direct performance comparison at this point. However, we are doing something similar where we have a large dataset of domain-specific language, and then a quite small dataset of labeled data for the classifier. It's remarkable what good results we are getting!


That sounds a lotttttt like what we're dealing with (hi there, I'm here, too!). This is definitely intriguing!


We use mostly deep learning for NLP at T-Mobile!

(our primary dataset for this is transcripts for customers chatting with T-Mobile representatives)

We wrote a blog about a simple way we use r-keras APIs in production - and open sourced our containers!

We also worked with AWS on SageMaker GroundTruth which is a pretty stellar way of getting labeled categorical data imo


Hi @julia I'll be there if there's room left


There certainly is! I think we'll be a cozily small group for chatting about our text mining and NLP tasks.


I will be there. Thanks.


Great! Looking forward to the conversation @BButler.


I'll look forward to being there ! Thank you


I'm planning to join too. Not doing a ton of NLP for work at the moment, but have some fun datasets in the work that I'm excited to explore further.