Work with multiple .osm files [Open Street Map]

I'm working on a project where I have to determine the shortest path (shortest.path the function in R) between two nodes. For that, I used an open source (open street map). The problem is in the size of my .osm file (an .osm file contains all the information for a given map, all nodes, ways and relations). Rstudio doesn’t have the ability to run such a large file (the size of my file is 3.2GB). My idea was to split my .osm file into several smaller files. The problem is: how can I get them into my code so as to determine the shortest path, since the information for each node/way will be found in different .osm files.

I hope someone can help me.
Thank you,

Actually Rstudio can manage that size of file, the real limitation would be your systems memory RAM, even if you split your file, if you still need to load them all in memory at the same time you will have the same issue at the end.

Try to rephrase your problem as a REPRoducible EXample (reprex) so other people can see what specific libraries or approach are you using, and maybe they can give you better help.

1 Like

There are probably others who are more experienced in R + OSM that could speak to preexisting libraries designed for this, but from working with big OSM data in the past, there's a few things you could try:

  • See if an existing package can handle your data (off the top of my head, osmdata may have a solution, but it looks like more of a querying package than a reading package).
  • Chances are you don't need all the information in the .osm file, and while it isn't common to write an event-based XML parser in R, it looks like it is possible. If you only include certain information as you read the file, there is a good chance your OSM data will be well within a reasonable memory usage, although to query the nodes effectively you'd have to arrange the output in some kind of data frame.
  • There are excellent utilities that will take .osm or .pbf files and stick them in a PostGIS-enabled database. This is great because it's not trivial to get all the information you need to do routing without doing some kind of query, and it's really nice to be able to do that query in SQL rather than selectively read an XML file to only return certain nodes. There's a not-too-bad tutorial on how to do this in a python package I once wrote. Querying the Postgres database for related nodes is something I did in Python, but you could have this looking much prettier using dbplyr and the sf/PostGIS connector (my Python code is here).

I hope that helps!

1 Like

Thank you paleolimbot!
All the information is welcome!!

Thank you andresrcs!!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.