How can I reduce memory usage of SPARQL package?

Hi All,

I am trying to run the below query using the SPARQL package. Using the default XML output option the resulting dataframe is only c.30MB, but the R session takes c.3GB of memory to run and is slow.

Using the TSV output option uses only c.500MB of memory and is quicker. Using the Land Registry Console is fast, so I think the issue is with the package exporting the data, rather than the SPARQL query.

I ideally need to get the memory usage below 1GB as this is my limit on Shiny.io. Does anyone know of a way to do this? I haven't managed to get the TSV output into a usable format - it appears as a single column dataframe?

Thanks in advance for any tips

# Step 1 Load Packages ----
library(SPARQL)

# Step 2 Set Query Endpoint ----
endpoint <- "https://landregistry.data.gov.uk/landregistry/query"

# Step 3 Define Query ----
query <- paste('
                  prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                  prefix ukhpi: <http://landregistry.data.gov.uk/def/ukhpi/>
                  
                  SELECT
                    ?stripped_regionName ?stripped_date ?ukhpi ?avprice ?volume ?newbuildvolume ?regionId             
                  WHERE
                  {
                  	  ?region ukhpi:refRegion  ?regionId .
                    	?region ukhpi:refMonth ?date .
                      ?region ukhpi:housePriceIndex ?ukhpi .
                    
                      ?regionId rdfs:label ?regionName
                      FILTER (langMatches( lang(?regionName), "EN") ) .
                      BIND (STR(?regionName)  AS ?stripped_regionName) .
                      BIND (STR(?date)  AS ?stripped_date) .
                      
                      OPTIONAL {?region ukhpi:averagePrice ?avprice .}
                      OPTIONAL {?region ukhpi:salesVolume ?volume .}
                      OPTIONAL {?region ukhpi:salesVolumeNewBuild ?newbuildvolume .}
                  }

                    ', sep=" ", collapse="")

# Step 4 Use SPARQL Package to Submit Query and Save Data ----
qd <- SPARQL(endpoint, query)
hpi_df <- qd$results

# Step 4 Alternative - Use TSV ----
#tsv <- SPARQL(endpoint, query, format = "tsv")

I would say, if its an inefficient architecture then take it offline i.e.
use you local computer to query the data, prepare it, and load that prepared data into your app.

Thanks for your suggestion. I've managed to get it to work in this way, the issue is that the data is updated each month. Ideally I'd like to be able to set the app running and leave it - with the data automatically refreshing each month.

Now you mention it though, I wonder if a solution would be to 1. prepare the current data offline 2. setup the query to only request data points from the last date that data exists for 3. merge the new data. This will be a bit more complex that I had hoped - but may be good R practice!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.