NPM Registry HTTP Semantics
Here at nice try, I’m working on an NPM proxy
and internal registry. All is fine and dandy until you notice that NPM Registry,
although returning ETag
headers, simply ignore them. For instance, let’s try
get the latest version of say, react
, and then use If-None-Match
to get an
indicator our request is still fresh.
First Request: No If-None-Match
GET /react HTTP/1.1
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
The response is expected:
HTTP/1.1 200 OK
Date: Thu, 30 Jan 2025 12:35:30 GMT
Content-Type: application/json
Content-Length: 5387821
Connection: close
CF-Ray: 90a18092cb3802e5-GRU
CF-Cache-Status: HIT
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Age: 75
Cache-Control: public, max-age=300
ETag: "5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Server: cloudflare
Fancy stuff. Very nice. Even an ETag
! So the expectation is that If-None-Match
yields an HTTP 304, right? Let’s try it again:
GET /react HTTP/1.1
If-None-Match: "5611f329debbb41fda058497d0d4c7d8"
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
And…
HTTP/1.1 200 OK
Date: Thu, 30 Jan 2025 12:37:37 GMT
Content-Type: application/json
Content-Length: 5387821
Connection: close
CF-Ray: 90a183af0d59ae90-GRU
CF-Cache-Status: HIT
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Age: 202
Cache-Control: public, max-age=300
ETag: "5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Server: cloudflare
OK
? My If-None-Match
is extactly the same ETag
returned on the response.
Second Request: Should HEAD
help us?
Okay, but WHAT IF we issued a HEAD
in order to compare ETag
s? That would be
enough to assert we have the same payload, right?
HEAD /react HTTP/1.1
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
HTTP/1.1 200 OK
Date: Thu, 30 Jan 2025 12:39:41 GMT
Content-Type: application/json
Connection: close
CF-Ray: 90a186b3687af23f-GRU
CF-Cache-Status: HIT
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Age: 24
Cache-Control: public, max-age=300
ETag: W/"5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Server: cloudflare
Wha-What? A Weak ETag? How? Why?
How NPM CLI handles this
After some digging, it seems NPM’s CLI is quite… simple. It internally caches the contents for 5 minutes. After that, it just downloads it again.
Workarounds
We could (which does not mean we should) read the initial headers and drop
the connection, but that does not mean we won’t receive the payload on a GET
request, we will only be breaking HTTP semantics. After all, the remote will be
shoving the payload to our connection nonetheless.
Another option would be perform the same GET
, but using Range
to not
get a body, as it seems the server supports it. Let’s try:
Leveraging Range
Header
The server seems to support byte ranges, as announced by the header
Accept-Ranges: bytes
. So let’s ask for a single byte. We will waste a single
byte, but at least it’s not 30-50MB depending on the manifest. I’ll keep the
If-None-Match
just for extra measure.
GET /react HTTP/1.1
If-None-Match: "5611f329debbb41fda058497d0d4c7d8"
Range: bytes=0-0
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
HTTP/1.1 206 Partial Content
Date: Thu, 30 Jan 2025 12:54:36 GMT
Content-Type: application/json
Content-Length: 1
Connection: close
Content-Range: bytes 0-0/5387821
CF-Ray: 90a19c9078a51b24-GRU
CF-Cache-Status: HIT
Access-Control-Allow-Origin: *
Age: 11
Cache-Control: public, max-age=300
ETag: "5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Set-Cookie: _cfuvid=hY7zxXExF2P4mja1cX6ynhOfcoAuJzxR42SmGTiLJvY-1738241676925-0.0.1.1-604800000; path=/; domain=.npmjs.org; HttpOnly; Secure; SameSite=None
Server: cloudflare
Success! We got a single byte, and a 206
status response. Is that enough? I
honestly don’t think so, considering that the server returns an ETag
. But if
that’s the only alternative, I’ll happily stick with it, although it feels a
really, really hacky solution.
Contacting NPM Support
I raised a ticket with NPM support to understand what is going on, but so far, I haven’t found a suitable alternative.
Will update this once I get a response from the fine folks at NPM.
One month later, I got the following reply from NPM Support:
So sorry for the oversight and delay addressing your concerns.
As previously mentioned, with the behavior you're seeing with the If-None-Match
header for registry.npmjs.org, it’s not expected for it to always send the full
payload when the ETag matches. Normally, if the If-None-Match header matches the
ETag returned by the registry, the server should respond with a 304 Not Modified
status, not the full payload.
It might be worth checking if there's anything else in the request (such as
additional headers or network issues) affecting the behavior.
As we’ve reached the maximum allowable time for keeping this ticket open, we
will be closing it. However, the issue is documented and escalated it to our
engineering team for further resolution.
We will continue working on this and encourage you to check periodically for any
updates on the status. I'll reach out with any updates.
We truly appreciate understanding throughout this process.
So… Not fixed, and perhaps won’t be. Bummer.