NPM Registry HTTP Semantics
Here at nice try, I’m working on an NPM proxy
and internal registry. All is fine and dandy until you notice that NPM Registry,
although returning ETag
headers, simply ignore them. For instance, let’s try
get the latest version of say, react
, and then use If-None-Match
to get an
indicator our request is still fresh.
First Request: No If-None-Match
GET /react HTTP/1.1
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
The response is expected:
HTTP/1.1 200 OK
Date: Thu, 30 Jan 2025 12:35:30 GMT
Content-Type: application/json
Content-Length: 5387821
Connection: close
CF-Ray: 90a18092cb3802e5-GRU
CF-Cache-Status: HIT
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Age: 75
Cache-Control: public, max-age=300
ETag: "5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Server: cloudflare
Fancy stuff. Very nice. Even an ETag
! So the expectation is that If-None-Match
yields an HTTP 304, right? Let’s try it again:
GET /react HTTP/1.1
If-None-Match: "5611f329debbb41fda058497d0d4c7d8"
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
And…
HTTP/1.1 200 OK
Date: Thu, 30 Jan 2025 12:37:37 GMT
Content-Type: application/json
Content-Length: 5387821
Connection: close
CF-Ray: 90a183af0d59ae90-GRU
CF-Cache-Status: HIT
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Age: 202
Cache-Control: public, max-age=300
ETag: "5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Server: cloudflare
OK
? My If-None-Match
is extactly the same ETag
returned on the response.
Second Request: Should HEAD
help us?
Okay, but WHAT IF we issued a HEAD
in order to compare ETag
s? That would be
enough to assert we have the same payload, right?
HEAD /react HTTP/1.1
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
HTTP/1.1 200 OK
Date: Thu, 30 Jan 2025 12:39:41 GMT
Content-Type: application/json
Connection: close
CF-Ray: 90a186b3687af23f-GRU
CF-Cache-Status: HIT
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Age: 24
Cache-Control: public, max-age=300
ETag: W/"5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Server: cloudflare
Wha-What? A Weak ETag? How? Why?
How NPM CLI handles this
After some digging, it seems NPM’s CLI is quite… simple. It internally caches the contents for 5 minutes. After that, it just downloads it again.
Workarounds
We could (which does not mean we should) read the initial headers and drop
the connection, but that does not mean we won’t receive the payload on a GET
request, we will only be breaking HTTP semantics. After all, the remote will be
shoving the payload to our connection nonetheless.
Another option would be perform the same GET
, but using Range
to not
get a body, as it seems the server supports it. Let’s try:
Leveraging Range
Header
The server seems to support byte ranges, as announced by the header
Accept-Ranges: bytes
. So let’s ask for a single byte. We will waste a single
byte, but at least it’s not 30-50MB depending on the manifest. I’ll keep the
If-None-Match
just for extra measure.
GET /react HTTP/1.1
If-None-Match: "5611f329debbb41fda058497d0d4c7d8"
Range: bytes=0-0
Host: registry.npmjs.org
Connection: close
User-Agent: [REDACTED]/0.1.0
HTTP/1.1 206 Partial Content
Date: Thu, 30 Jan 2025 12:54:36 GMT
Content-Type: application/json
Content-Length: 1
Connection: close
Content-Range: bytes 0-0/5387821
CF-Ray: 90a19c9078a51b24-GRU
CF-Cache-Status: HIT
Access-Control-Allow-Origin: *
Age: 11
Cache-Control: public, max-age=300
ETag: "5611f329debbb41fda058497d0d4c7d8"
Last-Modified: Wed, 29 Jan 2025 16:20:20 GMT
Vary: accept-encoding, accept
Set-Cookie: _cfuvid=hY7zxXExF2P4mja1cX6ynhOfcoAuJzxR42SmGTiLJvY-1738241676925-0.0.1.1-604800000; path=/; domain=.npmjs.org; HttpOnly; Secure; SameSite=None
Server: cloudflare
{
Success! We got a single byte, and a 206
status response. Is that enough? I
honestly don’t think so, considering that the server returns an ETag
. But if
that’s the only alternative, I’ll happily stick with it, although it feels a
really, really hacky solution.
Contacting NPM Support
I raised a ticket with NPM support to understand what is going on, but so far, I haven’t found a suitable alternative.
Will update this once I get a response from the fine folks at NPM.