BoF at rstudio::conf - Natural Language Processing

julia · December 7, 2018, 11:59am

Natural Language Processing / NLP

Keywords: Natural Language Processing, Natural Language Processing in R, NLP, Text analysis
Hosted by @julia
When and where: Thurs 10:30-11AM in the BoF Lounge 2

Interested?

If you'd like to get notifications about this group, be sure that you are "Watching" this topic-thread.

If you would like to focus on a specific topic within this category, or ensure you are connecting with the right folks, reply below, discuss, and share widely!

What is a Birds of a Feather Session? Learn more at the BoF Directory

julia · December 10, 2018, 4:33pm

I'm excited about hosting this BoF session at rstudio::conf! Who thinks they might come, and what kinds of topics would be fun to chat about in a casual, face-to-face setting?

zkajdan · December 13, 2018, 4:01pm

Hey

I'd be interesting in hearing if anyone's using deep learning for NLP (you guys at SO certainly must be ;-))?

Especially in light of the latest advances in DL for NLP due to transfer learning (BERT, ULMFit, ELMO...)

DPaschall · December 13, 2018, 4:19pm

Yep, we are. I just finished running a concept extraction model (BiLSTM-CRF network) this morning to see how this baseline model would handle part of our data. Not too bad for the first try via transfer learning.

zkajdan · December 13, 2018, 5:58pm

Cool! Extracting embeddings from ELMO, then building your own classifier on top, as (probably) done here: https://github.com/UKPLab/elmo-bilstm-cnn-crf?

DPaschall · December 13, 2018, 8:15pm

Precisely. Im using the approach just presented at the NeuroIPS 2018 conf described here: https://arxiv.org/pdf/1810.10566.pdf

DPaschall · December 13, 2018, 8:18pm

Our application is related to free-form medical notes, which is notoriously difficult (and less accurate) than the usual NLP dataset. To make things even MORE challenging, our domain is veterinary medicine. So we have text that may look like “ the boy just ain’t right”, with no ICD-10 codes for outcomes.

julia · December 14, 2018, 3:41am

We've been working with some ULMFit models internally, which has been really interesting and fun. I'm looking forward to getting to chat in person at rstudio::conf with some of you about the kinds of work you are doing!

DPaschall · December 14, 2018, 2:18pm

Interesting, Julia. I'm trying to get a feel for the relative merits of ULMFiT compared to ELMo. It's interesting that ELMo seems to benefit from training on a domain-specific data set....even though it is supposed to be more general than word embeddings. Have you found the same or similar observations with ULMFiT?

jbratt · December 17, 2018, 5:00pm

Hi Julia!
ULMFiT looks promising for a problem that I'm working on (essentially a text classification problem with quite small data sets). I'd love to pick your brain about it.

julia · December 18, 2018, 1:20am

Oh my goodness, @jbratt What a surprise!

(We worked together at a previous job.)

@DPaschall I haven't tried ELMo on the same datasets that we are using with ULMFit so I don't know if I can speak to a direct performance comparison at this point. However, we are doing something similar where we have a large dataset of domain-specific language, and then a quite small dataset of labeled data for the classifier. It's remarkable what good results we are getting!

jonthegeek · December 18, 2018, 2:17am

That sounds a lotttttt like what we're dealing with (hi there, I'm here, too!). This is definitely intriguing!

heatherklus · December 31, 2018, 10:47pm

We use mostly deep learning for NLP at T-Mobile!

(our primary dataset for this is transcripts for customers chatting with T-Mobile representatives)

We wrote a blog about a simple way we use r-keras APIs in production - and open sourced our containers!
https://opensource.t-mobile.com/blog/posts/r-tensorflow-api/

We also worked with AWS on SageMaker GroundTruth which is a pretty stellar way of getting labeled categorical data imo

StasK · January 10, 2019, 9:56pm

Hi @julia I'll be there if there's room left

julia · January 11, 2019, 12:00am

There certainly is! I think we'll be a cozily small group for chatting about our text mining and NLP tasks.

BButler · January 12, 2019, 4:03pm

I will be there. Thanks.

julia · January 12, 2019, 7:11pm

Great! Looking forward to the conversation @BButler.

ricksnell · January 14, 2019, 12:38am

I'll look forward to being there ! Thank you

caitlinhudon · January 16, 2019, 2:39am

I'm planning to join too. Not doing a ton of NLP for work at the moment, but have some fun datasets in the work that I'm excited to explore further.