Is natural language process available and reliable to extract data from medical records/


I have been offered an interview for a position that deals with extracting data from medical records. I have to use natural language processing, but the last time I look at nlp I was told that it was not ready to be used yet and that therefore the extraction will have mistakes. I know that there is a nlp library in R, but I do not know how well developed and reliable is. There may be other tools in the market. Could anyone please advise me as to the reliability of the R library or any other tools available and in what fields these nlp tools are used?

Hello gcefalu.
IMHO as far as I know, you have some good options.

  1. A Guide to Using spacyr
    You can use spacyr with miniconda. (If you feel confortable with the original spaCy,, then better.)
  2. You can try Rweka, a very good option for NLP and Text Mining. CRAN - Package RWeka Note, Weka has also an independent environment.
  3. You can try do the process through Microsoft Azure Using R in Azure Machine Learning Studio The problem is the cost. But will give a reliable option.
    It is possible to do the same in Amazon, but honestly I don't know about an R integration.

Hope this helps.

1 Like

Could you let me know if there are some nlp R libraries. I just need to read text. All the statistics and machine learning is done with R.

R has some really great NLP libraries. @AndyR gave you a link showing you the entire NLP ecosystem in R and this will let you know that there are many many tools you can use to do great stuff in R.

If you are interested in a great NLP tutorial, check this talk by Cosima Meyer at RLadies Tunis. I encourage you to follow the tutorial and code along (check the description for the code and data):

Thank you very much for the info.

I have been looking at the suggested software above and the R libraries, and it seems to me that they are natural language process research tools based on tokenization, graphical analysis and data set training. However, I cannot find a library or off the market tool that can read medical records and extract the meaning correctly. For example, “infection due to HIV” should not be the same as “infection from respiratory disorder” , or patient has the disease or does not have the disease. Are there any available solutions. May be natural language process cannot be used yet to find meaning from written text without error?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.