I have to extract parts of some strings and modify them using regex (think url validation/modification, for instance) in a dataframe of about 30 million rows. What are the general advices one should follow? I am aware this is not specifically an R question, but I suppose they may be specific R dimensions.
May be worth it splitting the data and do it in parallel fashion. That should be straightforward using parallel::parSapply (for example). Only if stringi isn’t already fast enough .
There could be a lot of difference between simple, static pattern and complex regex patterns that involves look around.