Why do people even think this? Bots almost always just use headful instrumented browsers now. if a human sitting at a keyboard can load the content, so can a bot.
Security measures never prevent all abuse. They raise the cost of abuse above an acceptable threshold. Many things work like this. Cleaning doesn't eliminate dirt, it dilutes the dirt below an acceptable threshold. Same for "repairing" and "defects", and some other pairs of things that escape me atm.
That's the same argument as CAPTCHA's - as far as I know there are no bots protesting them making their lives harder, but as a human - my life is much harder than it needs to be because things need me to prove I'm a human.
Clean for data ingestion usually means complicated for data creation - optimizing for the advertisers has material cash value downstream, but customers are upstream, and making it harder is material too.
I have to assume /s, but lacking that -- Why can't you just allow `curl`? You need a human for advertising dollars or a poor mechanism of rate limiting. I want to use your service. If you're buying me a fragment shader, I guess that's fine, but I'm feeding it to the dogs, not plugging in your rando hardware in to my web-browser.
Maybe you can call a WebGL extension that isn't supported. Or better yet have a couple of overdraws of quads. Their bot will handle it, but it will throttle their CPU like gangbusters.
All these "advanced" technologies that change faster than I can turn my neck, to make a simple request that looks like it was one of the "certified" big 3 web browsers, which will ironically tax the server less than a certified browser. Is this the nightmare dystopia I was warned about in the 90's? I wonder if anyone here can name the one company that is responsible for this despite positioning themselves as a good guy open source / hacker community contributor.
I'm rooting for Ladybird to gain traction in the future. Currently, it is using cURL proper for networking. That is probably going to have some challenges (I think cURL is still limited in some ways, e.g. I don't think it can do WebSockets over h2 yet) but on the other hand, having a rising browser engine might eventually remove this avenue for fingerprinting since legitimate traffic will have the same fingerprint as stock cURL.
It would be good to see Ladybird's cURL usage improve cURL itself, such as the WebSocket over h2 example you mention. It is also a good test of cURL to see and identify what functionality cURL is missing w.r.t. real-world browser workflows.
but on the other hand, having a rising browser engine might eventually remove this avenue for fingerprinting
If what I've seen from CloudFlare et.al. are any indication, it's the exact opposite --- the amount of fingerprinting and "exploitation" of implementation-defined behaviour has increased significantly in the past few months, likely in an attempt to kill off other browser engines; the incumbents do not like competition at all.
The enemy has been trying to spin it as "AI bots DDoSing" but one wonders how much of that was their own doing...
It's entirely deliberate. CloudFlare could certainly distinguish low-volume but legit web browsers from bots, as much as they can distinguish chrome/edge/safari/firefox from bots. That is if they cared to.
Hold up, one of those things is not like the other. Are we really blaming webmasters for 100x increases in costs from a huge wave of poorly written and maliciously aggressive bots?
I saw that, but I'm still not sure how this fits in:
> The enemy has been trying to spin it as "AI bots DDoSing" but one wonders how much of that was their own doing...
I'm reading that as `enemy == fingerprinters`, `that == AI bots DDoSing`, and `their own == webmasters, hosting providers, and CDNs (i.e., the fingerprinters)`, which sounds pretty straightforwardly like the fingerprinters are responsible for the DDoSing they're receiving.
That interpretation doesn't seem to match the rest of the post though. Do you happen to have a better one?
I'll bite. How do you serve 100x the traffic without 100x the costs? It costs something like 1e-10 dollars to serve a recipe page with a few photos, for example. If you serve it 100x more times, how does that not scale up?
my household is 6 people, it isn't uncommon to run 3 washing machine loads in a day and days without at least one are rare. I can imagine the convenience, but at this scale it sounds a bit unreasonable.
dishwasher runs at least once a day, at least 80% full, every day, unless we're traveling.
I think "slop" only refers to the output of generative AI systems. bot, crawler, scraper, or spider would be a more apt term for software making (excessive) requests to collect data.
As in “See-URL”? I’ve always called it curl but “see url” makes a hell of a lot of sense too! I’ve just never considered it and it’s one of those things you rarely say out loud.
I prefer cURL as well, but according to official sources it is curl. :D Not sure how it is pronounced though, I pronounce it as "see-url" and/or "see-U-R-L". It might be pronounced as "curl" though.
When I spoke to these guys [0] we touched on those quirks and foibles
that make a signature (including TCP stack stuff beyond control of any
userspace app).
I love this curl, but I worry that if a component takes on the role of
deception in order to "keep up" it accumulates a legacy of hard to
maintain "compatibility" baggage.
Ideally it should just say... "hey I'm curl, let me in"
The problem of course lies with a server that is picky about dress
codes, and that problem in turn is caused by crooks sneaking in
disguise, so it's rather a circular chicken and egg thing.
Since the first browser appeared I've always meant that sending a user agent id was a really bad idea. It breaks with the fundamental idea of the web protocol, that it's the server's responsibility to provide data and it's the client's responsibility to present it to the user. The server does not need to know anything about the client. Including user agent in this whole thing was a huge mistake as it allowed web site designers to code for specific quirks in browsers.
I can to some extent accept a capability list from the client, but I'm not so sure even that is necessary.
The User-Agent request-header field contains information about the
user agent originating the request. This is for statistical purposes,
the tracing of protocol violations, and automated recognition of user
agents for the sake of tailoring responses to avoid particular user
agent limitations.
What should instead happen is that Chrome should stop sending as much of a fingerprint, so that sites won't be able to fingerprint. That won't happen, since it's against Google's interests.
This is a fundamental misunderstanding of how TLS fingerprinting works. The "fingerprint" isn't from chrome sending a "fingerprint: [random uuid]" attribute in every TLS negotiation. It's derived from various properties of the TLS stack, like what ciphers it can accept. You can't make "stop sending as much of a fingerprint", without every browser agreeing on the same TLS stack. It's already minimal as it is, because there's basically no aspect of the TLS stack that users can configure, and chrome bundles its own, so you'd expect every chrome user to have the same TLS fingerprint. It's only really useful to distinguish "fake" chrome users (eg. curl with custom header set, or firefox users with user agent spoofer) from "real" chrome users.
What? Just fix the ciphers to a list of what's known to work + some safety margin. Each user needing some different specific cipher (like a cipher for horses, and one for dogs), is not a thing.
Chrome has randomized its ClientHello extension order for two years now.[0]
The companies to blame here are solely the ones employing these fingerprinting techniques, and those relying on services of these companies (which is a worryingly large chunk of the web). For example, after the Chrome change, Cloudflare just switched to a fingerprinter that doesn't check the order.[1]
> blame here are solely the ones employing these fingerprinting techniques,
Sure. And it's a tragedy. But when you look at the bot situation and
the sheer magnitude of resource abuse out there, you have to see it
from the other side.
FWIW the conversation mentioned above, we acknowledged that and moved
on to talk about behavioural fingerprinting and why it makes sense
not to focus on the browser/agent alone but what gets done with it.
Last time I saw someone complaining about scrapers, they were talking about 100gib/month. That's 300kbps. Less than $1/month in IP transit and ~$0 in compute. Personally I've never noticed bots show up on a resource graph. As long as you don't block them, they won't bother using more than a few IPs and they'll backoff when they're throttled
Didn't rachelbytheebay post recently that her blog was being swamped?
I've heard that from a few self-hosting bloggers now. And Wikipedia
has recently said more than half of traffic is noe bots. ARe you
claiming this isn't a real problem?
> Let's not go blaming vulnerabilities on those exploiting
them. Exploitation is also bad but being exploitable is a problem in and
of itself.
There's "vulnerabilities" and there's "inherent properties of a complex
protocol that is used to transfer data securely". One of the latter is
that metadata may differ from client to client for various reasons,
inside the bounds accepted in the standard. If you discriminate based
on such metadata, you have effectively invented a new proprietary
protocol that certain existing browsers just so happen to implement.
It's like the UA string, but instead of just copying a single HTTP
header, new browsers now have to reverse engineer the network stack of
existing ones to get an identical user experience.
I get that. I don't condone the behavior of those doing the fingerprinting. But what I'm saying is that the fact that it is possible to fingerprint should in pretty much all cases be viewed as a sort of vulnerability.
It isn't necessarily a critical vulnerability. But it is a problem on some level nonetheless. To the extent possible you should not be leaking information that you did not intend to share.
A protocol that can be fingerprinted is similar to a water pipe with a pinhole leak. It still works, it isn't (necessarily) catastrophic, but it definitely would be better if it wasn't leaking.
I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it.
Problem : most aggregation criteria are trivially spoofable, or very cheap to change :
- IP : with IPv6 this is not an issue to rotate your IP often
- UA : changing this is scraping 101
- SSL fingerprint : easy to use the same as everyone
- IP stack fingerprint : also easy to use a common one
- request / session tokens : it’s cheap to create a new session
You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy
Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all.
I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries
What's the advantage of randomizing the order, when all chrome users already have the same order? Practically speaking there's a bazillion ways to fingerprint Chrome besides TLS cipher ordering, that it's not worth adding random mitigations like this.
ladybird does not have the resources to be a contender to current browsers. its well marketed but has no benefits or reason to exist over chromium. its also a major security risk as it is designed yet again in demonstrably unsafe c++.
Did they also set IP_TTL to set the TTL value to match the platform being impersonated?
If not, then fingerprinting could still be done to some extent at the IP layer. If the TTL value in the IP layer is below 64, it is obvious this is either not running on modern Windows or is running on a modern Windows machine that has had its default TTL changed, since by default the TTL of packets on modern Windows starts at 128 while most other platforms start it at 64. Since the other platforms do not have issues communicating over the internet, so IP packets from modern Windows will always be seen by the remote end with TTLs at or above 64 (likely just above).
That said, it would be difficult to fingerprint at the IP layer, although it is not impossible.
>That said, it would be difficult to fingerprint at the IP layer, although it is not impossible.
Only if you're using PaaS/IaaS providers don't give you low level access to the TCP/IP stack. If you're running your own servers it's trivial to fingerprint all manner of TCP/IP properties.
I meant it is difficult relative to fingerprinting TLS and HTTP. The information is not exported by the berkeley socket API unless you use raw sockets and implement your own userland TCP stack.
Yeah, some sort of packet mirroring setup (eg. in iptables or at the switch level) + packet capture tool should be enough. Then you just need to join the data from the packet capture program/machine with your load balancer, using src ip + port + time.
The argument is that if the many (maybe the majority) of systems are sending packets with a TTL of 64 and they don't experience problems on the internet, then it stands to reason that almost everywhere on the internet is reachable in less than 64 hops (personally, I'd be amazed if it any routes are actually as high as 32 hops).
If everywhere is reachable in under 64 hops, then packets sent from systems that use a TTL of 128 will arrive at the destination with a TTL still over 64 (or else they'd have been discarded for all the other systems already).
Windows 9x used a TTL of 32. I vaguely recall hearing that it caused problems in extremely exotic cases, but that could have been misinformation. I imagine that >99.999% of the time, 32 is enough. This makes fingerprinting via TTL to distinguish between those who set it at 32, 64, 128 and 255 (OpenSolaris and derivatives) viable. That said, almost nobody uses Windows 9x or OpenSolaris derivatives on the internet these days, so I used values from systems that they do use for my argument that fingerprinting via TTL is possible.
What is the reasoning behind TTL counting down instead of up, anyway? Wouldn't we generally expect those routing the traffic to determine if and how to do so?
To allow the sender to set the TTL, right? Without adding another field to the packet header.
If you count up from zero, then you'd also have to include in every packet how high it can go, so that a router has enough info to decide if the packet is still live. Otherwise every connection in the network would have to share the same fixed TTL, or obey the TTL set in whatever random routers it goes through. If you count down, you're always checking against zero.
The primary purpose of TTL is to prevent packets from looping endlessly during routing. If a packet gets stuck in a loop, its TTL will eventually reach zero, and then it will be dropped.
That doesn't answer my question. If it counted up then it would be up to each hop to set its own policy. Things wouldn't loop endlessly in that scenario either.
Then random internet routers could break internet traffic by setting it really low and the user could not do a thing about it. They technically still can by discarding all traffic whose value is less than some value, but they don’t. The idea that they should set their own policy could fundamentally break network traffic flows if it ever became practiced.
This is a wild guess but: I am under the impression that the early internet was built somewhat naively so I guess that the sender sets it because they know best how long it stays relevant for/when it makes sense to restart or fail rather than wait.
It does make traceroute, where each packet is fired with one more available step than the last, feasible, whereas 'up' wouldn't. Of course, then we'd just start with max-hops and walk the number down I suppose. I still expect it would be inconvenient during debugging for various devices to have various ceilings.
Wait a sec… if the TLS handshakes look different, would it be possible to have an nginx level filter for traffic that claims to be a web browser (eg chrome user agent), yet really is a python/php script? Because this would account for the vast majority of malicious bot traffic, and I would love to just block it.
That's basically what security vendors like cloudflare does, except with even more fingerprinting, like a javascript challenge that checks the js interpreter/DOM.
JS to check user agent things like screen window dimensions as well, which legit browsers will have and bots will also present but with a more uniform and predictable set of x and y dimensions per set of source IPs. Lots of possibilities for js endpoint fingerprinting.
Maximizing reduces the variations, but there's still quite a bit of variation because of different display resolution + scaling settings + OS configuration (eg. short or tall taskbars).
Yes, and sites are doing this and it absolutely sucks because it's not reliable and blocks everyone who isn't using the latest Chrome on the latest Windows. Please don't whitelist TLS fingerprints unless you're actually under attack right now.
If you're going to whitelist (or block at all really) please simply redirect all rejected connections to a proof of work scheme. At least that way things continue to work with only mild inconvenience.
I am very curious if the current wave of mystery distributed (AI?) bots will just run javascript and be able to get past proof of work too....
Based on the fact that they are requesting the same absolutely useless and duplicative pages (like every possible combniation of query params even if it does not lead to unique content) from me hundreds of times per url, and are able to distribute so much that I'm only getting 1-5 requests per day from each IP...
...cost does not seem to be a concern for them? Maybe they won't actually mind ~5 seconds of CPU on a proof of work either? They are really a mystery to me.
I currently am using CloudFlare Turnstile, which incorporates proof of work but also various other signals, which is working, but I know does have false positives. I am working on implementing a simpler nothing but JS proof of work (SHA-512-based), and am going to switch that in and if it works great (becuase I don't want to keep out the false positives!), but if it doesn't, back to Turnstile.
The mystery distributred idiot bots were too much. (Scaling up resources -- they just scaled up their bot rates too!!!) I don't mind people scraping if they do it respectfully and reasonably; taht's not what's been going on, and it's an internet-wide phenomenon of the past year.
Blocking a hacking attack is not even a thing, they just change IP address each time they learn a new fact about how your system works and progress smoothly without interruption until they exfiltrate your data. Same goes for scrapers the only difference being there is no vulnerability to fix that will stop them.
This tool is pretty sweet in little bash scripts combo'd up with gnu parallel on red team engagements for mapping https endpoints within whatever scoped address ranges that will only respond to either proper browsers due to whatever, or with the SNI stuff in order. Been finding it super sweet for that. Can do all the normal curl switches like -H for header spoofing
I'm always ambivalent about things like this showing up here. On one hand, it's good to let others know that there is still that bit of rebelliousness and independence alive amongst the population. On the other hand, much like other "freedom is insecurity" projects, attracting unwanted attention may make it worse for those who rely on them.
Writing a browser is hard, and the incumbents are continually making it harder.
Your comment makes it sound like a browser being fingerprintable is a desired property by browser developers. It's just something that happens on its own from different people doing things differently. I don't see this as being about rebelliousness. Software being fingerprintable erodes privacy and software diversity.
Not all browsers, but Chrome certainly desires to be fingerprintable. They even try to cryptographically prove that the current browser is an unmodified Chrome with Web Environment Integrity [1].
Doesn't get more fingerprintable than that. They provide an un-falsifiable certificate that "the current browser is an unmodified Chrome build, running on an unmodified Android phone with secure boot".
If they didn't want to fingerprintable, they could just not do that and spend all the engineering time and money on something else.
"For these reasons, some web services use the TLS and HTTP handshakes to fingerprint which client is accessing them, and then present different content for different clients."
Back then websites weren’t so resource intensive. The negative backlash towards bots is kind of a side effect of how demanding expectations of web experiences has become.
About six months ago I went to a government auction site that required Internet Explorer. Yes, Internet Explorer. The site was active, too; the auction data was up-to-date. I added a user-agent extension in Chrome, switched to IE, retried and it worked; all functionality on the site was fine. So yeah, I was both sad and annoyed. My guess is this government office paid for a website 25 years ago and it hasn't been updated since.
In South Korea, ActiveX is still required for many things like banking and government stuff. So they're stuck with both IE and the gaping security hole in it that is ActiveX.
Not really. You can access any Korean bank or government website using Chrome, and they actually recommend Chrome these days.
They still want to install a bunch of programs on your computer, though. It's more or less the same stuff that used to be written as ActiveX extensions, but rewritten using modern browser APIs. :(
Yeah it's probably an ancient web site. This was commonplace back in the day when Internet Explorer had 90%+ market share. Lazy web devs couldn't be bothered to support other browsers (or didn't know how) so just added a message demanding you use IE as opposed to fixing the problems with the site.
You may enter our site iff you use software we approve. Anything else will be seen as malicious. Papers please!
I, too, am saddened by this gatekeeping. IIUC custom browsers (or user-agent) from scratch will never work on cloudflare sites and the like until the UA has enough clout (money, users, etc) to sway them.
It's kind of the opposite problem as well; huge well-funded companies bringing down open source project websites. See Xe's journey here: https://xeiaso.net/blog/2025/anubis/
One may posit "maybe these projects should cache stuff so page loads aren't actually expensive" but these things are best-effort and not the core focus of these projects. You install some Git forge or Trac or something and it's Good Enough for your contributors to get work done. But you have to block the LLM bots because they ignore robots.txt and naively ask for the same expensive-to-render page over and over again.
The commercial impact is also not to be understated. I remember when I worked for a startup with a cloud service. It got talked about here, and suddenly every free-for-open-source CI provider IP range was signing up for free trials in a tight loop. These mechanical users had to be blocked. It made me sad, but we wanted people to use our product, not mine crypto ;)
Wait until you hear many antivirus/endpoint software block "recent" domain names from being loaded.
According to them new domains are only used by evil people and should be blocked.
Kudos to the coder and the poster. I'm involved in a browser project that runs on OpenSSL, and figured I'd have to dig through WireShark myself at some point to figure this stuff out. Well, I may still need to, but now have many points of reference. If the most common use of OpenSSL is Python, then in the age of Cloudflare, a Firefox TLS spoofing option isn't just a good idea, it's a necessity.
ive been using puppeteer to query and read responses from deepseek.com, it works really well but i have to use a stealth mode and "headed" version to make it think its a person
The notion of real-world TLS/HTTP fingerprinting was somewhat new to me, and it looks interesting in theory, but I wonder what the build's use case really is? I mean you have the heavy-handed JavaScript running everywhere now.
There are API’s that chrome provides that allows servers to validate whether the request came from an official chrome browser. That would detect that this curl isn’t really chrome.
It’d be nice if something could support curl’s arguments but drive an actual headless chrome browser.
Siblings are being more charitable about this, but I just don't think what you're suggesting is even possible.
An HTTP client sends a request. The server sends a response. The request and response are made of bytes. Any bytes Chrome can send, curl-impersonate could also send.
Chromium is open source. If there was some super secret handshake, anyone could copy that code to curl-impersonate. And if it's only in closed-source Chrome, someone will disassemble it and copy it over anyway.
>Chromium is open source. If there was some super secret handshake, anyone could copy that code to curl-impersonate. And if it's only in closed-source Chrome, someone will disassemble it and copy it over anyway.
Not if the "super secret handshake" is based on hardware-backed attestation.
It doesn't matter how complicated the operation is, if you have a copy of the Chrome binary, you can observe what CPU instructions it uses to sign the challenge, and replicate the operations yourself. Proxying to a real Chrome is the most blunt approach, but there's nothing stopping you from disassembling the binary and copying the code to run in your own process, independent of Chrome.
> you can observe what CPU instructions it uses to sign the challenge, and replicate the operations yourself.
No you can't, that's the whole thing with homomorphic encryption. Ask GPT to explain it to you why it's so.
You have no way of knowing the bounds of the code I will access from the inside the homomorphic code. Depending on the challenge I can query parts of the binary and hash that in the response. So you will need to replicate the whole binary.
Similar techniques are already used today by various copy-protection/anti-cheat game protectors. Most of them remain unbroken.
You're just describing ordinary challenge-response. That has nothing to do with homomorphic encryption and there are plenty of examples from before homomorphic encryption became viable, for example https://www.geoffchappell.com/notes/security/aim/index.htm
Homomorphic encryption hides data, not computation. If you've been trying to learn compsci from GPT, you might have fallen victim to hallucinations. I'd recommend starting from wikipedia instead. https://en.wikipedia.org/wiki/Homomorphic_encryption
And btw most games are cracked within a week of release. You have way too much faith in buzzwords and way too little faith in bored Eastern European teenagers.
That snippet has nothing to do with homomorphic encryption. It's just the same kind of challenge-response AIM and many others were already doing in the 90s.
You seem convinced that homomorphic encryption is some kind of magic that prevents someone from observing their own hardware, or from running Chrome under a debugger. That's just not true. And I suspect we don't share enough of a common vocabulary to have a productive discussion, so I'll end it here.
I don't believe this is correct. Homomorphic encryption enables computation on encrypted data without needing to decrypt it.
You can't use the result of that computation without first decrypting it though. And you can't decrypt it without the key. So what you describe regarding memory addresses is merely garden variety obfuscation.
Unmasking an obfuscated set of allowable address ranges for hashing given an arbitrary binary is certainly a difficult problem. However as you point out it is easily sidestepped.
You are also mistaken about anti-cheat measures. The ones that pose the most difficulty primarily rely on kernel mode drivers. Even then, without hardware attestation it's "just" an obfuscation effort that raises the bar to make breaking it more time consuming.
What you're actually witnessing there is that if a sufficient amount of effort is invested in obfuscation and those efforts carried out continuously in order to regularly change the obfuscation then you can outstrip the ability of the other party to keep up with you.
Set a UA and any headers and/or cookies with regular cURL compiled with HTTP/3. This can be done with wrapper scripts very easily. 99.999% of problems solved with no special magic buried in an unclean fork.
You should really read the "Why" section of the README before jumping to conclusions:
```
some web services use the TLS and HTTP handshakes to fingerprint which client is accessing them, and then present different content for different clients. These methods are known as TLS fingerprinting and HTTP/2 fingerprinting respectively. Their widespread use has led to the web becoming less open, less private and much more restrictive towards specific web clients
With the modified curl in this repository, the TLS and HTTP handshakes look exactly like those of a real browser.
```
For example, this will get you past Cloudflare's bot detection.
The README indicates that this fork is compiled with nss (from Firefox) and BoringSSL (from Chromium) to resist fingerprinting based on the TLS lib. CLI flags won't do that.
There's a fork of this that has some great improvements over to the top of the original and it is also actively maintained: https://github.com/lexiforest/curl-impersonate
There's also Python bindings for the fork for anyone who uses Python: https://github.com/lexiforest/curl_cffi
I suppose it does make sense that a "make curl look like a browser" program would get sponsored by "bypass bot detection" services...
Easy. Just make a small fragment shader to produce a token in your client. No bot is going to waste GPU resources to compile your shader.
Why do people even think this? Bots almost always just use headful instrumented browsers now. if a human sitting at a keyboard can load the content, so can a bot.
Security measures never prevent all abuse. They raise the cost of abuse above an acceptable threshold. Many things work like this. Cleaning doesn't eliminate dirt, it dilutes the dirt below an acceptable threshold. Same for "repairing" and "defects", and some other pairs of things that escape me atm.
That's the same argument as CAPTCHA's - as far as I know there are no bots protesting them making their lives harder, but as a human - my life is much harder than it needs to be because things need me to prove I'm a human.
Clean for data ingestion usually means complicated for data creation - optimizing for the advertisers has material cash value downstream, but customers are upstream, and making it harder is material too.
What is so hard about running a fragment shader after the site has loaded?
I have to assume /s, but lacking that -- Why can't you just allow `curl`? You need a human for advertising dollars or a poor mechanism of rate limiting. I want to use your service. If you're buying me a fragment shader, I guess that's fine, but I'm feeding it to the dogs, not plugging in your rando hardware in to my web-browser.
We are talking about Curl bots here. How is what you are saying relevant?
no, nyanpasu64's comment extended the discussion to general bot detection
Can't they use a software renderer like swiftshader? You don't need to pass in an actual gpu through virtio or whatever.
Maybe you can call a WebGL extension that isn't supported. Or better yet have a couple of overdraws of quads. Their bot will handle it, but it will throttle their CPU like gangbusters.
Sounds like a PoW system with extra steps?
Can't a bot just collect a few real tokens and then send those instead of trying to run the shader?
How do you automate that? Just generate a new token for each day.
There's also a module for fully integrating this with the Python requests library: https://github.com/el1s7/curl-adapter
All these "advanced" technologies that change faster than I can turn my neck, to make a simple request that looks like it was one of the "certified" big 3 web browsers, which will ironically tax the server less than a certified browser. Is this the nightmare dystopia I was warned about in the 90's? I wonder if anyone here can name the one company that is responsible for this despite positioning themselves as a good guy open source / hacker community contributor.
I'm rooting for Ladybird to gain traction in the future. Currently, it is using cURL proper for networking. That is probably going to have some challenges (I think cURL is still limited in some ways, e.g. I don't think it can do WebSockets over h2 yet) but on the other hand, having a rising browser engine might eventually remove this avenue for fingerprinting since legitimate traffic will have the same fingerprint as stock cURL.
It would be good to see Ladybird's cURL usage improve cURL itself, such as the WebSocket over h2 example you mention. It is also a good test of cURL to see and identify what functionality cURL is missing w.r.t. real-world browser workflows.
but on the other hand, having a rising browser engine might eventually remove this avenue for fingerprinting
If what I've seen from CloudFlare et.al. are any indication, it's the exact opposite --- the amount of fingerprinting and "exploitation" of implementation-defined behaviour has increased significantly in the past few months, likely in an attempt to kill off other browser engines; the incumbents do not like competition at all.
The enemy has been trying to spin it as "AI bots DDoSing" but one wonders how much of that was their own doing...
It's entirely deliberate. CloudFlare could certainly distinguish low-volume but legit web browsers from bots, as much as they can distinguish chrome/edge/safari/firefox from bots. That is if they cared to.
Hold up, one of those things is not like the other. Are we really blaming webmasters for 100x increases in costs from a huge wave of poorly written and maliciously aggressive bots?
> Are we really blaming...
No, they're discussing increased fingerprinting / browser profiling recently and how it affects low-market-share browsers.
I saw that, but I'm still not sure how this fits in:
> The enemy has been trying to spin it as "AI bots DDoSing" but one wonders how much of that was their own doing...
I'm reading that as `enemy == fingerprinters`, `that == AI bots DDoSing`, and `their own == webmasters, hosting providers, and CDNs (i.e., the fingerprinters)`, which sounds pretty straightforwardly like the fingerprinters are responsible for the DDoSing they're receiving.
That interpretation doesn't seem to match the rest of the post though. Do you happen to have a better one?
"their own" = CloudFlare and/or those who have vested interests in closing up the Internet.
Your costs only went up 100x if you built your site poorly
I'll bite. How do you serve 100x the traffic without 100x the costs? It costs something like 1e-10 dollars to serve a recipe page with a few photos, for example. If you serve it 100x more times, how does that not scale up?
I dont think they're doing this to kill off browser engines; they're trying to sift browsers into "user" and "AI slop", so they can prioritize users.
This is entirely web crawler 2.0 apocolypse.
man i just want a bot to buy groceries for me
That’s one of the few reasons to leave the house. I’d like dishes and laundry bots first, please.
You mean dishwashers and washing machines?
Yes, but no. I want a robot to load and unload those.
I have been paying my local laundromat to do mu laundry for over a decade now, it’s probably cheaper than youre imagining and sooo worth it.
my household is 6 people, it isn't uncommon to run 3 washing machine loads in a day and days without at least one are rare. I can imagine the convenience, but at this scale it sounds a bit unreasonable.
dishwasher runs at least once a day, at least 80% full, every day, unless we're traveling.
I think "slop" only refers to the output of generative AI systems. bot, crawler, scraper, or spider would be a more apt term for software making (excessive) requests to collect data.
I used to call it "cURL", but apparently officially it is curl, correct?
I’d guess Daniel pronounce it as ”kurl”, with a hard C like in ”crust”, since hes swedish.
As in “See-URL”? I’ve always called it curl but “see url” makes a hell of a lot of sense too! I’ve just never considered it and it’s one of those things you rarely say out loud.
I prefer cURL as well, but according to official sources it is curl. :D Not sure how it is pronounced though, I pronounce it as "see-url" and/or "see-U-R-L". It might be pronounced as "curl" though.
When I spoke to these guys [0] we touched on those quirks and foibles that make a signature (including TCP stack stuff beyond control of any userspace app).
I love this curl, but I worry that if a component takes on the role of deception in order to "keep up" it accumulates a legacy of hard to maintain "compatibility" baggage.
Ideally it should just say... "hey I'm curl, let me in"
The problem of course lies with a server that is picky about dress codes, and that problem in turn is caused by crooks sneaking in disguise, so it's rather a circular chicken and egg thing.
[0] https://cybershow.uk/episodes.php?id=39
> Ideally it should just say... "hey I'm curl, let me in"
What? Ideally it should just say "GET /path/to/page".
Sending a user agent is a bad idea. That shouldn't be happening at all, from any source.
Since the first browser appeared I've always meant that sending a user agent id was a really bad idea. It breaks with the fundamental idea of the web protocol, that it's the server's responsibility to provide data and it's the client's responsibility to present it to the user. The server does not need to know anything about the client. Including user agent in this whole thing was a huge mistake as it allowed web site designers to code for specific quirks in browsers. I can to some extent accept a capability list from the client, but I'm not so sure even that is necessary.
Absolutely, yes! A protocol should not be tied to client details. Where did "User Agent" strings even come from?
They're in the HTTP/1.0 spec. https://www.rfc-editor.org/rfc/rfc1945#section-10.15
10.15 User-Agent
What should instead happen is that Chrome should stop sending as much of a fingerprint, so that sites won't be able to fingerprint. That won't happen, since it's against Google's interests.
This is a fundamental misunderstanding of how TLS fingerprinting works. The "fingerprint" isn't from chrome sending a "fingerprint: [random uuid]" attribute in every TLS negotiation. It's derived from various properties of the TLS stack, like what ciphers it can accept. You can't make "stop sending as much of a fingerprint", without every browser agreeing on the same TLS stack. It's already minimal as it is, because there's basically no aspect of the TLS stack that users can configure, and chrome bundles its own, so you'd expect every chrome user to have the same TLS fingerprint. It's only really useful to distinguish "fake" chrome users (eg. curl with custom header set, or firefox users with user agent spoofer) from "real" chrome users.
What? Just fix the ciphers to a list of what's known to work + some safety margin. Each user needing some different specific cipher (like a cipher for horses, and one for dogs), is not a thing.
>Just fix the ciphers to a list of what's known to work + some safety margin.
That's already the case. The trouble is that NSS (what firefox uses) doesn't support the same cipher suites as boringssl (what chrome uses?).
Part of the fingerprint is stuff like the ordering of extensions, which Chrome could easily do but AFAIK doesn’t.
(AIUI Google’s Play Store is one of the biggest TLS fingerprinting culprits.)
Chrome has randomized its ClientHello extension order for two years now.[0]
The companies to blame here are solely the ones employing these fingerprinting techniques, and those relying on services of these companies (which is a worryingly large chunk of the web). For example, after the Chrome change, Cloudflare just switched to a fingerprinter that doesn't check the order.[1]
[0]: https://chromestatus.com/feature/5124606246518784
[1]: https://blog.cloudflare.com/ja4-signals/
> blame here are solely the ones employing these fingerprinting techniques,
Sure. And it's a tragedy. But when you look at the bot situation and the sheer magnitude of resource abuse out there, you have to see it from the other side.
FWIW the conversation mentioned above, we acknowledged that and moved on to talk about behavioural fingerprinting and why it makes sense not to focus on the browser/agent alone but what gets done with it.
Last time I saw someone complaining about scrapers, they were talking about 100gib/month. That's 300kbps. Less than $1/month in IP transit and ~$0 in compute. Personally I've never noticed bots show up on a resource graph. As long as you don't block them, they won't bother using more than a few IPs and they'll backoff when they're throttled
For some sites, things are a lot worse. See, for example, Jonathan Corbet's report[0].
0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664
How can you say it's $0 in compute without knowing if the data returned required any computation?
Didn't rachelbytheebay post recently that her blog was being swamped? I've heard that from a few self-hosting bloggers now. And Wikipedia has recently said more than half of traffic is noe bots. ARe you claiming this isn't a real problem?
> The companies to blame here are solely the ones employing these fingerprinting techniques,
Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself.
> Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself.
There's "vulnerabilities" and there's "inherent properties of a complex protocol that is used to transfer data securely". One of the latter is that metadata may differ from client to client for various reasons, inside the bounds accepted in the standard. If you discriminate based on such metadata, you have effectively invented a new proprietary protocol that certain existing browsers just so happen to implement.
It's like the UA string, but instead of just copying a single HTTP header, new browsers now have to reverse engineer the network stack of existing ones to get an identical user experience.
I get that. I don't condone the behavior of those doing the fingerprinting. But what I'm saying is that the fact that it is possible to fingerprint should in pretty much all cases be viewed as a sort of vulnerability.
It isn't necessarily a critical vulnerability. But it is a problem on some level nonetheless. To the extent possible you should not be leaking information that you did not intend to share.
A protocol that can be fingerprinted is similar to a water pipe with a pinhole leak. It still works, it isn't (necessarily) catastrophic, but it definitely would be better if it wasn't leaking.
I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it. Problem : most aggregation criteria are trivially spoofable, or very cheap to change : - IP : with IPv6 this is not an issue to rotate your IP often - UA : changing this is scraping 101 - SSL fingerprint : easy to use the same as everyone - IP stack fingerprint : also easy to use a common one - request / session tokens : it’s cheap to create a new session You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy
Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all.
I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries
What's the advantage of randomizing the order, when all chrome users already have the same order? Practically speaking there's a bazillion ways to fingerprint Chrome besides TLS cipher ordering, that it's not worth adding random mitigations like this.
I'm hoping this means Ladybird might support ftp URLs.
and even the Gopher protocol!
ladybird does not have the resources to be a contender to current browsers. its well marketed but has no benefits or reason to exist over chromium. its also a major security risk as it is designed yet again in demonstrably unsafe c++.
Did they also set IP_TTL to set the TTL value to match the platform being impersonated?
If not, then fingerprinting could still be done to some extent at the IP layer. If the TTL value in the IP layer is below 64, it is obvious this is either not running on modern Windows or is running on a modern Windows machine that has had its default TTL changed, since by default the TTL of packets on modern Windows starts at 128 while most other platforms start it at 64. Since the other platforms do not have issues communicating over the internet, so IP packets from modern Windows will always be seen by the remote end with TTLs at or above 64 (likely just above).
That said, it would be difficult to fingerprint at the IP layer, although it is not impossible.
>That said, it would be difficult to fingerprint at the IP layer, although it is not impossible.
Only if you're using PaaS/IaaS providers don't give you low level access to the TCP/IP stack. If you're running your own servers it's trivial to fingerprint all manner of TCP/IP properties.
https://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting
I meant it is difficult relative to fingerprinting TLS and HTTP. The information is not exported by the berkeley socket API unless you use raw sockets and implement your own userland TCP stack.
Couldn’t you just monitor the inbound traffic and associate the packets to the connections? Doing your own TCP seems silly.
Yeah, some sort of packet mirroring setup (eg. in iptables or at the switch level) + packet capture tool should be enough. Then you just need to join the data from the packet capture program/machine with your load balancer, using src ip + port + time.
Wouldn’t the TTL value of received packets depend on network conditions? Can you recover the client’s value from the server?
The argument is that if the many (maybe the majority) of systems are sending packets with a TTL of 64 and they don't experience problems on the internet, then it stands to reason that almost everywhere on the internet is reachable in less than 64 hops (personally, I'd be amazed if it any routes are actually as high as 32 hops).
If everywhere is reachable in under 64 hops, then packets sent from systems that use a TTL of 128 will arrive at the destination with a TTL still over 64 (or else they'd have been discarded for all the other systems already).
Windows 9x used a TTL of 32. I vaguely recall hearing that it caused problems in extremely exotic cases, but that could have been misinformation. I imagine that >99.999% of the time, 32 is enough. This makes fingerprinting via TTL to distinguish between those who set it at 32, 64, 128 and 255 (OpenSolaris and derivatives) viable. That said, almost nobody uses Windows 9x or OpenSolaris derivatives on the internet these days, so I used values from systems that they do use for my argument that fingerprinting via TTL is possible.
What is the reasoning behind TTL counting down instead of up, anyway? Wouldn't we generally expect those routing the traffic to determine if and how to do so?
To allow the sender to set the TTL, right? Without adding another field to the packet header.
If you count up from zero, then you'd also have to include in every packet how high it can go, so that a router has enough info to decide if the packet is still live. Otherwise every connection in the network would have to share the same fixed TTL, or obey the TTL set in whatever random routers it goes through. If you count down, you're always checking against zero.
If your doctor says you have only 128 days to live, you count down, not up. TTL is time to live, which is the same thing.
The primary purpose of TTL is to prevent packets from looping endlessly during routing. If a packet gets stuck in a loop, its TTL will eventually reach zero, and then it will be dropped.
That doesn't answer my question. If it counted up then it would be up to each hop to set its own policy. Things wouldn't loop endlessly in that scenario either.
Then random internet routers could break internet traffic by setting it really low and the user could not do a thing about it. They technically still can by discarding all traffic whose value is less than some value, but they don’t. The idea that they should set their own policy could fundamentally break network traffic flows if it ever became practiced.
This is a wild guess but: I am under the impression that the early internet was built somewhat naively so I guess that the sender sets it because they know best how long it stays relevant for/when it makes sense to restart or fail rather than wait.
It does make traceroute, where each packet is fired with one more available step than the last, feasible, whereas 'up' wouldn't. Of course, then we'd just start with max-hops and walk the number down I suppose. I still expect it would be inconvenient during debugging for various devices to have various ceilings.
Wait a sec… if the TLS handshakes look different, would it be possible to have an nginx level filter for traffic that claims to be a web browser (eg chrome user agent), yet really is a python/php script? Because this would account for the vast majority of malicious bot traffic, and I would love to just block it.
Cloudflare uses JA3 and now JA4 TLS fingerprints, which are hashes of various TLS handshake parameters. https://github.com/FoxIO-LLC/ja4/blob/main/technical_details... has more details on how that works, and they do offer an Nginx module: https://github.com/FoxIO-LLC/ja4-nginx-module
That's basically what security vendors like cloudflare does, except with even more fingerprinting, like a javascript challenge that checks the js interpreter/DOM.
JS to check user agent things like screen window dimensions as well, which legit browsers will have and bots will also present but with a more uniform and predictable set of x and y dimensions per set of source IPs. Lots of possibilities for js endpoint fingerprinting.
I also present a uniform and predictable set of x and y dimensions per source IPs as a human user who maximizes my browser window
Maximizing reduces the variations, but there's still quite a bit of variation because of different display resolution + scaling settings + OS configuration (eg. short or tall taskbars).
Or settings like auto-hide MacOS dock vs not auto hide, affecting the vertical size of the browser window.
Well, I think that's what OP is meant to avoid you doing, exactly.
Yes, and sites are doing this and it absolutely sucks because it's not reliable and blocks everyone who isn't using the latest Chrome on the latest Windows. Please don't whitelist TLS fingerprints unless you're actually under attack right now.
If you're going to whitelist (or block at all really) please simply redirect all rejected connections to a proof of work scheme. At least that way things continue to work with only mild inconvenience.
I am very curious if the current wave of mystery distributed (AI?) bots will just run javascript and be able to get past proof of work too....
Based on the fact that they are requesting the same absolutely useless and duplicative pages (like every possible combniation of query params even if it does not lead to unique content) from me hundreds of times per url, and are able to distribute so much that I'm only getting 1-5 requests per day from each IP...
...cost does not seem to be a concern for them? Maybe they won't actually mind ~5 seconds of CPU on a proof of work either? They are really a mystery to me.
I currently am using CloudFlare Turnstile, which incorporates proof of work but also various other signals, which is working, but I know does have false positives. I am working on implementing a simpler nothing but JS proof of work (SHA-512-based), and am going to switch that in and if it works great (becuase I don't want to keep out the false positives!), but if it doesn't, back to Turnstile.
The mystery distributred idiot bots were too much. (Scaling up resources -- they just scaled up their bot rates too!!!) I don't mind people scraping if they do it respectfully and reasonably; taht's not what's been going on, and it's an internet-wide phenomenon of the past year.
Blocking a hacking attack is not even a thing, they just change IP address each time they learn a new fact about how your system works and progress smoothly without interruption until they exfiltrate your data. Same goes for scrapers the only difference being there is no vulnerability to fix that will stop them.
This tool is pretty sweet in little bash scripts combo'd up with gnu parallel on red team engagements for mapping https endpoints within whatever scoped address ranges that will only respond to either proper browsers due to whatever, or with the SNI stuff in order. Been finding it super sweet for that. Can do all the normal curl switches like -H for header spoofing
I'm always ambivalent about things like this showing up here. On one hand, it's good to let others know that there is still that bit of rebelliousness and independence alive amongst the population. On the other hand, much like other "freedom is insecurity" projects, attracting unwanted attention may make it worse for those who rely on them.
Writing a browser is hard, and the incumbents are continually making it harder.
Your comment makes it sound like a browser being fingerprintable is a desired property by browser developers. It's just something that happens on its own from different people doing things differently. I don't see this as being about rebelliousness. Software being fingerprintable erodes privacy and software diversity.
Not all browsers, but Chrome certainly desires to be fingerprintable. They even try to cryptographically prove that the current browser is an unmodified Chrome with Web Environment Integrity [1].
Doesn't get more fingerprintable than that. They provide an un-falsifiable certificate that "the current browser is an unmodified Chrome build, running on an unmodified Android phone with secure boot".
If they didn't want to fingerprintable, they could just not do that and spend all the engineering time and money on something else.
[1]: https://en.wikipedia.org/wiki/Web_Environment_Integrity
I had to do something like this with Ansible's get_url module once.
Was having issues getting module to download an installer from a vendors site.
Played with Curl/WGET, but was running into the same, while it worked from a browser.
I ended up getting both Curl + get_url to work by passing the same headers my browser sent such as User-Agent, encoding, etc
"For these reasons, some web services use the TLS and HTTP handshakes to fingerprint which client is accessing them, and then present different content for different clients."
Examples: [missing]
Showhn at the time https://news.ycombinator.com/item?id=30378562
Back then (2022) it was Firefox only
I do kind of yern for the simpler days when if a website didn't mind bots it allowed it and if they did they blocked your user agent.
Back then websites weren’t so resource intensive. The negative backlash towards bots is kind of a side effect of how demanding expectations of web experiences has become.
Cool tool but it shouldn't matter whether the client is a browser or not. I feel sad that we need such a tool in the real world
About six months ago I went to a government auction site that required Internet Explorer. Yes, Internet Explorer. The site was active, too; the auction data was up-to-date. I added a user-agent extension in Chrome, switched to IE, retried and it worked; all functionality on the site was fine. So yeah, I was both sad and annoyed. My guess is this government office paid for a website 25 years ago and it hasn't been updated since.
In South Korea, ActiveX is still required for many things like banking and government stuff. So they're stuck with both IE and the gaping security hole in it that is ActiveX.
is this still true? I know this was the case in the past, but even in 2025?
Not really. You can access any Korean bank or government website using Chrome, and they actually recommend Chrome these days.
They still want to install a bunch of programs on your computer, though. It's more or less the same stuff that used to be written as ActiveX extensions, but rewritten using modern browser APIs. :(
[flagged]
Yeah it's probably an ancient web site. This was commonplace back in the day when Internet Explorer had 90%+ market share. Lazy web devs couldn't be bothered to support other browsers (or didn't know how) so just added a message demanding you use IE as opposed to fixing the problems with the site.
You may enter our site iff you use software we approve. Anything else will be seen as malicious. Papers please!
I, too, am saddened by this gatekeeping. IIUC custom browsers (or user-agent) from scratch will never work on cloudflare sites and the like until the UA has enough clout (money, users, etc) to sway them.
This was sadly always going to be the outcome of the Internet going commercial.
There's too much lost revenue in open things for companies to embrace fully open technology anymore.
It's kind of the opposite problem as well; huge well-funded companies bringing down open source project websites. See Xe's journey here: https://xeiaso.net/blog/2025/anubis/
One may posit "maybe these projects should cache stuff so page loads aren't actually expensive" but these things are best-effort and not the core focus of these projects. You install some Git forge or Trac or something and it's Good Enough for your contributors to get work done. But you have to block the LLM bots because they ignore robots.txt and naively ask for the same expensive-to-render page over and over again.
The commercial impact is also not to be understated. I remember when I worked for a startup with a cloud service. It got talked about here, and suddenly every free-for-open-source CI provider IP range was signing up for free trials in a tight loop. These mechanical users had to be blocked. It made me sad, but we wanted people to use our product, not mine crypto ;)
>> Otherwise your users have to see a happy anime girl every time they solve a challenge. This is a feature.
I love that human, what a gem
Wait until you hear many antivirus/endpoint software block "recent" domain names from being loaded. According to them new domains are only used by evil people and should be blocked.
[dead]
Kudos to the coder and the poster. I'm involved in a browser project that runs on OpenSSL, and figured I'd have to dig through WireShark myself at some point to figure this stuff out. Well, I may still need to, but now have many points of reference. If the most common use of OpenSSL is Python, then in the age of Cloudflare, a Firefox TLS spoofing option isn't just a good idea, it's a necessity.
Only three patches and shell wrappers, this should get Daniel coding. Imho this should definitely be in mainline curl.
ive been using puppeteer to query and read responses from deepseek.com, it works really well but i have to use a stealth mode and "headed" version to make it think its a person
The notion of real-world TLS/HTTP fingerprinting was somewhat new to me, and it looks interesting in theory, but I wonder what the build's use case really is? I mean you have the heavy-handed JavaScript running everywhere now.
Good luck getting past imperva
If you thought cloudflare challenge can be bad, imperva doesn't even want most humans through
Now I'm waiting for the MCP version of this.. :)
https://github.com/puremd/puremd-mcp handles this, probably some other MCP servers out there that handle this too
[dead]
[flagged]
[flagged]
[flagged]
There are API’s that chrome provides that allows servers to validate whether the request came from an official chrome browser. That would detect that this curl isn’t really chrome.
It’d be nice if something could support curl’s arguments but drive an actual headless chrome browser.
Are you referring to the Web Environment Integrity[0] stuff, or something else? 'cos WEI was abandoned in late 2023.
[0] https://github.com/explainers-by-googlers/Web-Environment-In...
Siblings are being more charitable about this, but I just don't think what you're suggesting is even possible.
An HTTP client sends a request. The server sends a response. The request and response are made of bytes. Any bytes Chrome can send, curl-impersonate could also send.
Chromium is open source. If there was some super secret handshake, anyone could copy that code to curl-impersonate. And if it's only in closed-source Chrome, someone will disassemble it and copy it over anyway.
>Chromium is open source. If there was some super secret handshake, anyone could copy that code to curl-impersonate. And if it's only in closed-source Chrome, someone will disassemble it and copy it over anyway.
Not if the "super secret handshake" is based on hardware-backed attestation.
True, but beside the point.
GP claims the API can detect the official chrome browser, and the official chrome browser runs fine without attestation.
> someone will disassemble it and copy it over anyway.
Not if Chrome uses homomorphic encryption to sign a challange. It's doable today. But then you could run a real Chrome and forward the request to it.
No, even homomorphic encryption wouldn't help.
It doesn't matter how complicated the operation is, if you have a copy of the Chrome binary, you can observe what CPU instructions it uses to sign the challenge, and replicate the operations yourself. Proxying to a real Chrome is the most blunt approach, but there's nothing stopping you from disassembling the binary and copying the code to run in your own process, independent of Chrome.
> you can observe what CPU instructions it uses to sign the challenge, and replicate the operations yourself.
No you can't, that's the whole thing with homomorphic encryption. Ask GPT to explain it to you why it's so.
You have no way of knowing the bounds of the code I will access from the inside the homomorphic code. Depending on the challenge I can query parts of the binary and hash that in the response. So you will need to replicate the whole binary.
Similar techniques are already used today by various copy-protection/anti-cheat game protectors. Most of them remain unbroken.
You're just describing ordinary challenge-response. That has nothing to do with homomorphic encryption and there are plenty of examples from before homomorphic encryption became viable, for example https://www.geoffchappell.com/notes/security/aim/index.htm
Homomorphic encryption hides data, not computation. If you've been trying to learn compsci from GPT, you might have fallen victim to hallucinations. I'd recommend starting from wikipedia instead. https://en.wikipedia.org/wiki/Homomorphic_encryption
And btw most games are cracked within a week of release. You have way too much faith in buzzwords and way too little faith in bored Eastern European teenagers.
> Homomorphic encryption hides data, not computation
Data is computation.
That snippet has nothing to do with homomorphic encryption. It's just the same kind of challenge-response AIM and many others were already doing in the 90s.
You seem convinced that homomorphic encryption is some kind of magic that prevents someone from observing their own hardware, or from running Chrome under a debugger. That's just not true. And I suspect we don't share enough of a common vocabulary to have a productive discussion, so I'll end it here.
I don't believe this is correct. Homomorphic encryption enables computation on encrypted data without needing to decrypt it.
You can't use the result of that computation without first decrypting it though. And you can't decrypt it without the key. So what you describe regarding memory addresses is merely garden variety obfuscation.
Unmasking an obfuscated set of allowable address ranges for hashing given an arbitrary binary is certainly a difficult problem. However as you point out it is easily sidestepped.
You are also mistaken about anti-cheat measures. The ones that pose the most difficulty primarily rely on kernel mode drivers. Even then, without hardware attestation it's "just" an obfuscation effort that raises the bar to make breaking it more time consuming.
What you're actually witnessing there is that if a sufficient amount of effort is invested in obfuscation and those efforts carried out continuously in order to regularly change the obfuscation then you can outstrip the ability of the other party to keep up with you.
I’m interested in learning more about this. Are these APIs documented anywhere and are there server side implementation examples that you know of?
EDIT: this is the closest I could find. https://developers.google.com/chrome/verified-access/overvie... ...but it's not generic enough to lead me to the declaration you made.
I think they confused Chrome and Googlebot.
There’s no way this couldn’t be replicated by a special build of curl.
Set a UA and any headers and/or cookies with regular cURL compiled with HTTP/3. This can be done with wrapper scripts very easily. 99.999% of problems solved with no special magic buried in an unclean fork.
You should really read the "Why" section of the README before jumping to conclusions:
``` some web services use the TLS and HTTP handshakes to fingerprint which client is accessing them, and then present different content for different clients. These methods are known as TLS fingerprinting and HTTP/2 fingerprinting respectively. Their widespread use has led to the web becoming less open, less private and much more restrictive towards specific web clients
With the modified curl in this repository, the TLS and HTTP handshakes look exactly like those of a real browser. ```
For example, this will get you past Cloudflare's bot detection.
The README indicates that this fork is compiled with nss (from Firefox) and BoringSSL (from Chromium) to resist fingerprinting based on the TLS lib. CLI flags won't do that.
That doesn't solve the problem of TLS handshake fingerprinting, which is the whole point of this project.
That’s not the point of this fork.
And “unclean fork” is such an unnecessary and unprofessional comment.
There’s an entire industry of stealth browser technologies out there that this falls under.