Why does st_transform() re-set coordinates to (0,0)?

Huge impostor syndrome vibes, as I am working with this data frequently, but I still don't understand it.

I have a .geojson file containing line features.

st_crs(streets_import)
Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["World - 85°S to 85°N"],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

I need it to be in 4326, and so I naturally do:

st_transform(streets_import, 4326)

However, the result of this is that all coordinates re-set to (0,0) in my lower left corner.

If, on the other hand, I set the crs, I get the correct behavior:

st_crs(streets_import) <- 4326

Everything else works as expected now, but I have a feeling I've done something wrong.


My question is why st_transform() behaves this way, resetting coords to zero?
(Bonus question: I am really confused by the difference between EPSG 4326 and 3857. Like, one is geographic and one is projected, but both of them capture all/most of the Earth and use the same numeric values for coordinates - absolute lat and long. Where am I confused?)


(I can provide a reprex, but I felt like this question is simple enough for someone to explain to me without needing a reprex. In other words, it's not an issue with data and it is something obvious to most geospatial professionals)

It is hard to make 100% certain without having your original data; but my gut feeling is that your data file is malformed.

To give you a bit background:

  • st_tranform(something, 4326) produces a WGS84 output from an input it well understands (so that it can get the starting point for all the transformations)
  • st_crs(something) <- 4326 sets the CRS of something object to WGS84, without applying any transformations
    But it - crucially - gives {sf} & friends a firm understanding of the reference frame, providing a starting point for any transformations. Note also that this call can be written in a pipe friendly way as st_set_crs(something, 4326)

Thus if you have a .geojson which misbehaves, but setting its CRS to 4326 without doing any transformation whatsoever makes it behave as expected, makes me think about malformed input.

To verify my hypothesis you can run st_crs(streets_import) - it should not give any meaningful answer (thus confirming that {sf} & friends do have a proper reference frame for your coordinates - it is just numbers; but do they mean degrees, meters, or survey feet?)

Thank you.
I did attach the output of st_crs(streets_import) in my initial post, it seems to be fairly comprehensive.

The geojson file is generated by Uber's Movement Data Tollkit: https://www.npmjs.com/package/movement-data-toolkit


Here's a tiny sample of the geojson:

{"features":
	[
		{"geometry":
			{"coordinates":[
				[30.629361683629536,50.435336107438744],
				[30.629568483629537,50.43539060743875]],
			"type":"LineString"
			},
		"type":"Feature",
		"properties":{
			"osmstartnodeid":4781689317,
			"osmhighway":"service",
			"osmendnodeid":961573429,
			"osmwayid":82624498}
			},
		{"geometry":
			{"coordinates":[
				[30.629559316370464,50.43542539256126],
				[30.629352516370464,50.43537089256125]],
			"type":"LineString"
			},
		"type":"Feature",
		"properties":{
			"osmstartnodeid":961573429,
			"osmhighway":"service",
			"osmendnodeid":4781689317,
			"osmwayid":82624498}
			}
	],
"crs":{
	"type":"name",
	"properties":{
		"name":"urn:ogc:def:crs:EPSG::3857"
		}
	},
"type":"FeatureCollection"
}

Notice the last couple of lines defining the coord system.

As you can see, absolute coordinates are listed. After running st_transform(sample, 4326), the coordinates change to something very close to zero.

Sorry to have misread your original post; the st_crs() output clearly was there. My bad.

But the point stands - the behaviour you describe is typical in situations when {sf} loses track of your coordinate reference system, and mistakes degrees for meters (30 & 50 meters from origin is still very close to the Null Island, 30 & 50 degrees not anymore).

Setting st_crs() of your object does nothing in itself other than that it rectifies the reference system, so that the thirty and fifty in your coordinates are understood as degrees.

A slight confusion might come out of the fact that both 4326 and 3857 are in degrees, so the "right & proper" approach should in theory be:

streets_import %>%
  st_set_crs(3857) %>% # set the reference frame to 3857 correctly
  st_transform(4326) # transform from 3857 to 4326

But in practice your approach is equivalent; the key is making {sf} interpret the units of your coordinate system as degrees.

2 Likes

Thanks!
This makes sense now!

My only remaining question is about this bit

Is this a problem with my geojson file, or with {sf}?

BTW, this still fails...

This works:

streets_import %>%
  st_set_crs(4326)

I have, out of curiosity, looked up the GeoJSON format specs - it seems that the authors have given up on the entire CRS issue, see: https://tools.ietf.org/html/rfc7946#page-12

But back to my hypotesis that the GeoJSON was malformed. I have saved your sample as asdf.json and imported it locally.

library(sf)
library(dplyr)

# read the data in as it was
sample <- st_read("asdf.json")

# bouding box of the original file
st_bbox(sample)
    xmin     ymin     xmax     ymax 
30.62935 50.43534 30.62957 50.43543 

# throw out the old CRS and interpret all figures as decimal degrees
mod_sample <- st_set_crs(sample, 4326) 

# tranform it "back" to 3857
check <- st_transform(mod_sample, 3857)

# this would be a correct bounding box, if the json really was in 3857
st_bbox(check)
   xmin    ymin    xmax    ymax 
3409644 6522013 3409668 6522028 

# double check by viewing on screen 
mapview::mapview(check)

And this is what I get - it looks like a small street somewhere in Kyiv, which I get is an expected outcome.

And on a personal note I find it greatly amusing that Kyiv has a boulevard named after Jaroslav Hašek.

2 Likes

Stupid, stupid me! 3857 is not in degrees, but in meters. Of course - https://epsg.io/3857

Sorry for the confusion. I am too ashamed to edit now, but please disregard that piece of code.

BTW, this still fails...

1 Like

@jlacko, wow, thanks a lot!
So, to sum up, if I understand it correctly, my .geojson file says it is 3857, but it is lying: it is in 4326. Therefore, the coordinate reference system shouldn't be transformed, it should be set as 4326. I.e. "I know you say you are 3857, but you're lying, I hereby christen you 4326!"


And yes, my sample includes 2 Kyiv streets: full geojson file contained all Kyiv streets, but it was a 90Mb file, and so I simply edited the rest out and left only 2 streets on random. I actually lived not too far from Hašek blvd when I lived in Kyiv. And Hašek himself lived in Kyiv for 2 years (only in much more affluent neighborhood in the historic city center, not on the outskirts of the city where they named a street after him LOL).

What are the odds that out of ~200,000 streets in the geojson file, I'd keep that one, and someone from Czechia would be helping me review the error...

Glad to be of service!

And to rephrase the "lying 3857 bit" - all vector based geo data formats contain two sets of information:

  • a bunch of pairs of coordinates (a pair marks a point, and points can be then assembled into a line, polygon or what not)
  • a reference system that explains how the pairs of coordinates should be interpreted (things such as whether the coordinates should be understood as in meters, degrees or US survey feet, where is the origin and what shape is the Earth)

The second part is highly standardized, hence the the EPSG codes (EPSG stands for European Petroleum Survey Group, a now defunct organization that was an early standards setter; for in surveying for oil there must be absolute order). It was this second part that was wrong in your GeoJSON file.

Your file tried to tell you was that the pairs of coordinates should be read as meters north and east from the Null Island (this is what 3857 means). And what you did was you told your {sf} to disregard this information, and interpret the coordinates as decimal degrees.

The pairs of coordinates remained as they were, but your interpretation of their meaning has changed.


And having said that I too find the coincidence funny. As someone living in Prague and working in tech I am used to having friends & colleagues originally from Ukraine - it is not a very happy place at the moment, and the ties between the two countries run deep - but still, *this* was a stretch.

As I wrote earlier, I am glad I could help.

2 Likes

All that's left of Atlantis?

image

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.