Hi All,
I am trying to run the below query using the SPARQL package. Using the default XML output option the resulting dataframe is only c.30MB, but the R session takes c.3GB of memory to run and is slow.
Using the TSV output option uses only c.500MB of memory and is quicker. Using the Land Registry Console is fast, so I think the issue is with the package exporting the data, rather than the SPARQL query.
I ideally need to get the memory usage below 1GB as this is my limit on Shiny.io. Does anyone know of a way to do this? I haven't managed to get the TSV output into a usable format - it appears as a single column dataframe?
Thanks in advance for any tips
# Step 1 Load Packages ----
library(SPARQL)
# Step 2 Set Query Endpoint ----
endpoint <- "https://landregistry.data.gov.uk/landregistry/query"
# Step 3 Define Query ----
query <- paste('
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix ukhpi: <http://landregistry.data.gov.uk/def/ukhpi/>
SELECT
?stripped_regionName ?stripped_date ?ukhpi ?avprice ?volume ?newbuildvolume ?regionId
WHERE
{
?region ukhpi:refRegion ?regionId .
?region ukhpi:refMonth ?date .
?region ukhpi:housePriceIndex ?ukhpi .
?regionId rdfs:label ?regionName
FILTER (langMatches( lang(?regionName), "EN") ) .
BIND (STR(?regionName) AS ?stripped_regionName) .
BIND (STR(?date) AS ?stripped_date) .
OPTIONAL {?region ukhpi:averagePrice ?avprice .}
OPTIONAL {?region ukhpi:salesVolume ?volume .}
OPTIONAL {?region ukhpi:salesVolumeNewBuild ?newbuildvolume .}
}
', sep=" ", collapse="")
# Step 4 Use SPARQL Package to Submit Query and Save Data ----
qd <- SPARQL(endpoint, query)
hpi_df <- qd$results
# Step 4 Alternative - Use TSV ----
#tsv <- SPARQL(endpoint, query, format = "tsv")