How to mutate a new column with the extracted numeric values from another column in the same table in a remote PostgreSQL Database.

Hi,

I am new to R and I need to find a way to do the following:

I have access to a huge remote PostgreSQL database, in this database I have a table called "occurrence" and this table has a column called "uri", it contains a list of URI (links to a webpage). Each entry in this column have this format: "abc://abc- def-ghi-abc/12345".. The only thing that changes in the column is the number 12345, the text part (the URI) stays the same throughout the column.

My question is, how can I manage to create(mutate) a new column in the same table, and this new column will be named uri_id and MUST contain ONLY the numeric part extracted from the above mentioned "uri" column.

Example:

Remote PostgreSQL Database > Metadata > Table"occurrence":

id sub_id uri
3654 5741 abc://abc- def-ghi-abc/12345
9784 5742 abc://abc- def-ghi-abc/45789
9751 5743 abc://abc- def-ghi-abc/97856
9794 5746 abc://abc- def-ghi-abc/69785

Results should look like this:

id sub_id uri uri_id
3654 5741 abc://abc-de-fgh.abc/12345 12345
9784 5742 abc://abc-de-fgh.abc/45789 45789
9751 5743 abc://abc-de-fgh.abc/97856 97856
9794 5746 abc://abc-de-fgh.abc/69785 69785

First i defined the table that contains this coumn
occurrence <- tbl(db_name, in_schema("metadata", "occurrence"))

Print(occurrence) returns the table normally...

then I tried this
str_replace(occurrence$uri, "abc://abc- def-ghi-abc/", "")
it returned:
character (0)
Printing it or exporting it would give Null and an empty table.

I also tried this:

uri_id <- mutate(uri_id = as.numeric(str_extract(occurrence$uri, "[0-9]+")))
it returned this error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"

I tried to just substitute text elements like this:
uri_id <- mutate(uri_id = as.numeric(gsub(".?([0-9]+).", "\1", occurrence$uri)))
print(uri_id)

it returned the same error!

I tried tidyr/ extract and extract_:
occurrence$uri %>% extract_(occurrence$uri, "abc://abc- def-ghi-abc/")
it returned this error:
Error in UseMethod("extract_") :
no applicable method for 'extract_' applied to an object of class "NULL"

I would really appreciate your help in choosing the right way to achieve this task.

Thanks.

This is how to do it

library(dplyr)
library(stringr)

sample_df <- data.frame(
  stringsAsFactors = FALSE,
                id = c(3654, 9784, 9751, 9794),
            sub_id = c(5741, 5742, 5743, 5746),
               uri = c("abc://abc-def-ghi-abc/12345",
                       "abc://abc-def-ghi-abc/45789",
                       "abc://abc-def-ghi-abc/97856","abc://abc-def-ghi-abc/69785")
)

sample_df %>% 
    mutate(uri_id = str_extract(uri, "\\d+$"))
#>     id sub_id                         uri uri_id
#> 1 3654   5741 abc://abc-def-ghi-abc/12345  12345
#> 2 9784   5742 abc://abc-def-ghi-abc/45789  45789
#> 3 9751   5743 abc://abc-def-ghi-abc/97856  97856
#> 4 9794   5746 abc://abc-def-ghi-abc/69785  69785

Created on 2021-02-28 by the reprex package (v1.0.0.9002)

Note: Next time please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Thank you. It did return character 0. I managed to make it work using dbGetQuery by embedding an SQL substring code:

uri_id <- dbGetQuery(
db_mdr,
"SELECT uri, entity_id,
substring(uri,33,37)
FROM metadata.occurrence
where uri notnull"
)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.