Conversation
Anubis is literally Hitler
5
2
5

@phnt@fluffytail.org why so much hate towards anubis? I would have thought having some free software against scraping bots would be a good thing (?)

5
0
1
@waifu Because it fails at preventing scraping bots that are smart enough to delete one "l" from Mozilla and deployed by people that will definitely not notice that. While annoying everybody else in the process. I would rather solve a boat challenge from Google or CF than stare at a page that gives me a fake progress bar for 7 seconds at best on my phone.
Record_2025-05-30-14-14-00.mp4
3
4
6
@waifu @phnt >javascript garbage
>protecting random text that nobody gives a shit about being scraped
>doesn't stop any bot that gives half a fart
0
1
2

@phnt@fluffytail.org @waifu@mai.waifuism.life IIRC the point of Anubis is to prevent generic scraper bots, not targeted ones. It doesn't even stop scraping, just slows them down.

I find its quick rise in popularity to be a bit sus tbh

0
0
0

@phnt@fluffytail.org @waifu@mai.waifuism.life If the scraping bots remove the "I" from Mozilla, server operators can easily just block the unique user agent. Anubis isn't "a tool to stop bots", its a tool to verify connections pretending to be a browser are actual users.

2
0
0
@tyil @waifu Read the post you replied to again.

And since we are already in this game of slightly changing UA. What if I change it to Moozilla, or Mozillla. How will you match against that now, without manually filtering Mozilla and every legitimate UA that connects to your server away. You simply won't, and if you will, you already aren't the quintessential Anubis deployer.
1
0
2
@phnt I've seen API endpoints covered by it, shit's fucked.
0
0
1
@waifu @phnt Because typing -H 'User-Agent: Butthuffer Nigger 3000' is tiring after a while.
1
0
1

@phnt @waifu Read the post of mine again. Anubis isn’t a catch-all bot filter, its to filter out connection pretending to be a browser. I know the “lmao I can bypass Anubis with this simple trick” is the peak of /g/ hacker mentality, but in the real world everyone already knew this. It is mentioned in the blog post where the guy talks about it.

Anubis exists solely to ensure connections that act like browsers to prove they are browsers. If a connection doesn’t pretend its a browser, Anubis does nothing, as is the intended effect, because at that point if the connection misbehaves, you can just block that particular UA.

2
0
1
@tyil @phnt @waifu this is simultaneously probably all true while also noting the people installing the thing don't actually serve degraded content to other user agents probably blobcatgoogly
1
0
2

@icedquinn @phnt @waifu

don’t actually serve degraded content to other user agents probably

Depends on what you mean, I do “serve degraded content” to other UAs in the sense that I block LLM scrapers that misbehave when I can. To reiterate, Anubis exists because LLM scrapers pretend to be a browser with a regular browser UA. If one were to block that, you’d block all legitimate browser users, so realistically its not an option.

If you pretend to be a regular browser, Anubis exists to verify this. If you don’t pretend to be a regular browser, it does nothing.

How is this so hard to grasp.

1
0
0
@tyil @phnt @waifu i'm pointing out that most of the anubis installs probably don't actually do anything with the user agent string, thus making the hosts as much of derps as /g/
1
0
1

@icedquinn @phnt @waifu Ah, yeah if you don’t do any UA blocking otherwise I guess there’s no point, but in my experience people seek out Anubis because general UA blocking couldn’t get the job done (because LLMs started to pretend to be browsers).

I would expect anyone that has set up Anubis to have reached the point where other forms of access control have stopped being practical.

0
0
0
@tyil @waifu
>Anubis isn't a catch-all bot filter
>2025-01-19 - Block AI scrapers with Anubis
>bypassed by slightly changing browser UA

>Anubis exists solely to ensure connections that act like browsers to prove they are browsers.
No, it exists to prevent scraping by "checking that a browser is valid". Which it does not do at all. It's trivial to change the UA in an instrumented browser.
1
0
2

@phnt@fluffytail.org @waifu@mai.waifuism.life You're having a really hard time reading my posts I guess. I'm sorry for you.

1
0
0
@tyil @waifu You are missing the point completely. There is no reason to prove a browser is "real" in the normal world. If you are trying to do that, you are trying to combat a bot attack and your mitigation simply has no effect since it can be easily bypassed.
1
0
1

@phnt @waifu

There is no reason to prove a browser is “real” in the normal world.

Okay, keep believing that.

you are trying to combat a bot attack

Correct. Sounds like there are reasons after all. Only took a whole sentence to figure that one out.

your mitigation simply has no effect

It appears to have an effect, and not just for my personal cgit. It appears a lot of people are using it because they are seeing a (positive) effect in combatting LLM scrapers with it.

it can be easily bypassed

It can, and that’s ok. Because if you bypass it, you become a unique UA that I can just block with any regular UA block in HAProxy. Even if you automate “random” UAs, I can put in a pretty excessive UA blacklist with patterns if I so desire. The entire point is that a connection using a regular browser UA has to prove they are in fact a regular, legitimate browser, because blocking those isn’t feasible, because you’d block nearly all legitimate traffic otherwise.

Its not a hard concept I think. I don’t know why I have to reiterate the same thing three times for you, but I truly hope this time it’ll stick. If not, for the love of Stallman please just cancel your Internet subscription.

3
0
0
@tyil @waifu
>Because if you bypass it, you become a unique UA that I can just block with any regular UA block in HAProxy
>Its not a hard concept I think. I don't know why I have to reiterate the same thing three times for you, but I truly hope this time it'll stick. If not, for the love of Stallman please just cancel your Internet subscription.

Keep living in your clown world.

Here, have a proper solution that got me 0.1 r/s of bot requests instead of 20 r/s on my git server, instead of your half-assed one.
alibabacloud-git-scraping.txt
huaweicloud-git-scraping.txt
google-usercontent-git-scraping.txt
3
1
3

@tyil @waifu Along with this one:

map_hash_bucket_size 256;
map $http_user_agent $git_scrapers {
        default 0;
        "~*claudebot" 1;
        "~*meta-externalagent" 1;
        "~*amazonbot" 1;
}

server {
        server_name whatever;

        listen 443 ssl http2;
        listen [::]:443 ssl http2;

        location / {
                if ($git_scrapers = 1) {
                        return 402;
                }
        }
}
0
1
3

@phnt@fluffytail.org @waifu@mai.waifuism.life Its cute that you only get 20rq/s and think that's the scale of "the real world".

As a sidenote, I'm already blocking entire ASs.

2
0
0
@tyil @phnt @waifu

I am seriously disappointed that this turned out to be so hard to understand
1
0
0
@tyil @waifu Yeah, it is cute that it already brought my old VPS to it's knees with the IO. I don't need to throw more hardware at my problems. Yet, unlike you, I can actually deal with it without resorting to nuclear options like PoW challenges. But you can't really annoy anyone with Anubis on your git server, because probably nobody but you goes there.

>As a sidenote, I'm already blocking entire ASs.
"I clicked a button on the Internet, and it gave me a list of subnets for a company that I can block, don't know how though. Probably with nginx or HAProxy."

Also it is really funny to me that you first replied to a post that has a 2 minute video of me loading LKML while being cockblocked by Anubis on my phone and you still defend this piece of garbage software shoved into everyone's face by a egoistical dev that apparently begs for sponsors while allegedly making 180K/yr.

Now, go away, click the boats and try to annoy someone else. Maybe you'll get a more satisfactory reaction there.
image.png
image.png
1
0
1
@waifu @phnt Because of the condescending furry bullshit designed to make you waste CPU cycles because everyone is a regular browser with JavaScript enabled.
0
0
2
@tyil @phnt @waifu

> Anubis isn't "a tool to stop bots",

"Anubis [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using a proof-of-work challenge in order to protect upstream resources from scraper bots."
1
0
2
@tyil @phnt @waifu

> Its cute that you only get 20rq/s and think that's the scale of "the real world".

In the real world, the overwhelming majority of sites do not even get that. The overwhelming majority of "Anubis" deployments sit in front of sites that do not get 20r/day.
2
0
2

@p Anubis is total garbage that makes some sites basically unusable for me when my connection is poor (often). If AI is bad then mechanisms implemented to combat it are worse. You don't combat enshittification with more enshittification. This is like in 《The Matrix》 where they block out the sun to combat the machines, just making an already-bad problem worse. @tyil @phnt @waifu

3
1
4
@adiz @tyil @phnt @waifu

> This is like in 《The Matrix》 where they block out the sun to combat the machines, just making an already-bad problem worse.

This is, from now on, how I explain this to everyone.
0
0
3
@adiz @tyil @p @waifu It's come to the point where I would prefer if people used CuckFlare instead. Because it's that bad.
0
0
1

@p@fsebugoutzone.org @tyil@fedi.tyil.nl @phnt@fluffytail.org what if I want to only get connections from users and want zero bots reading my sites (that already use JavaScript) wouldn't using Anubis work for me?

3
0
2

@waifu@mai.waifuism.life @p@fsebugoutzone.org @tyil@fedi.tyil.nl @phnt@fluffytail.org Bots can access the site even with anubis, it just takes slightly more computational resources

2
0
1

@earslash@ebiverse.social @p@fsebugoutzone.org @tyil@fedi.tyil.nl @phnt@fluffytail.org I'm guessing that's enough to defer a few of them no? Why do people use it if it doesn't work for this specific purpose?

2
0
1
@waifu @phnt @tyil

> wouldn't using Anubis work for me?

Not really, no. Even if the "just change your UA lol" but were not present, HeadlessChrome still exists. The best you could do is slow down dishonest bots. This is often enough to get them to go away, but it depends on whether the scraping is targeted or not.
1
0
2
@earslash @phnt @tyil @waifu Slightly less, actually, since you can just change the UA.

Anubis is a bad execution of a bad idea. It is not the only piece of software that does this and is not even close to the best piece of software for this purpose. It's just that the purpose is, 99% of the time, bad and stupid.
1
0
1
@waifu @tyil @p The nature of the Internet prevents you from doing that unless you write a script that analyzes every IP and checks that behavior against what a normal browser session would that (like loading frontend files and the API calls results from executing JS). The reality is that once you host something publicly, it's public and restricting access to it is a never-ending cat and mouse game.

As I've said above, your best bet that works probably the best is putting your server behind CF and enabling all the protections. You'll annoy people like me and Pete, but at least it does what it advertises. Anubis does not; it blocks only opportunistic scrapers, not someone that has more than 40 IQ and is somewhat determined.

Basically the only way to keep your sanity is to don't care about it unless it starts giving your problems. Your fedi instance is probably already getting scraped in some way and you don't even know about it.
3
0
2

@p@fsebugoutzone.org @phnt@fluffytail.org @tyil@fedi.tyil.nl what if I block every user agent that isn't a browser? I only want connections by users not bots

3
0
1
@waifu @p @tyil Also with CloudFlare you have to be extra careful for it to not cache your API queries, auth tokens, etc. It's not that simply, hence why people deploy Anubis while thinking that it's some magic fixall that will make the bots go away.
1
1
3

@waifu@mai.waifuism.life @p@fsebugoutzone.org @tyil@fedi.tyil.nl @phnt@fluffytail.org It does deter "some" bots - i don't know how effective it is in the larger context. I think a lot of the sites using it have poor coding behind them that can't efficiency handle any sort of major traffic, bot or not.

My insider source says the creator of Anubis is still surprised at the sudden intake of sites using it, and they never expected it to get this far (it was a personal side project after all). The PoW thing is supposedly temporary until they find a new way to determine bot traffic. See
https://github.com/TecharoHQ/anubis/blob/main/web/index.templ

"Ultimately, this is a hack whose real purpose is to give a "good enough" placeholder solution so that more
time can be spent on fingerprinting and identifying headless browsers (EG: via how they do font rendering)
so that the challenge proof of work page doesn't need to be presented to users that are much more likely to
be legitimate."

2
0
0
@waifu @phnt @tyil @earslash

> Why do people use it if it doesn't work for this specific purpose?

You have been on this feddy verse longer than I have and the answer to this question is the same as the answer to the question of why so many instances continue to block Gab, which has not federated since 2019: retards cargo-cult shit.
1
0
3
@waifu @tyil @p That's impossible since UAs vary greatly depending on your browser, browser version, browser engine, OS, architecture of the system, phone, tablets,... And all the Chinese bots use browser user-agents anyway (the reason why Anubis was created). You can block the US scrapers easily based on UA, since they have the decency to be honest about it, but that's about it.
1
0
1

@waifu @p @phnt @tyil
if you really want connections from only humans you’d have to to use a custom captcha (off the shelf ones have solvers) or a HTTP password or something. Could kill off some human traffic though.

1
0
1

@p Mastodon block-lists never get investigated or audited. If you get added to one you're there forever. And, they LARP about things like a "consensus model" for determining block validities---but, they all just blindly copy blocks over meaning that by de-facto they have an engineered "consensus" through unanimity. Mastodon + Mastodon users really are like AI/Borg or something. Their very existence is a Fediverse blight. @earslash @tyil @phnt @waifu

2
0
3
@earslash @tyil @waifu @p
>"Ultimately, this is a hack whose real purpose is to give a "good enough" placeholder solution so that more...
This is one of those hoping to do the almost impossible types of projects. Google and CloudFlare have been trying to do that for years, they still fail at doing that, and since browsers now prevent a lot of attempts at fingerprinting, it will get increasingly hard. Ultimately they are trying to write the ultimate tracking script that is better at its job than Google Analytics.
0
0
1
@adiz @earslash @tyil @p @waifu Since this is my thread, I can derail it however I want, so here goes :).

Pete thought (thinks) that Fediverse as a network can't survive a split where basically the Mastodon and the rest divide and are unable to talk to each other.
I think that we are already at that point, or very nearly it, and things continue to be more-or-less the same. What changed over the years is a decrease in shitposting and an increase in politics sperging from poast and alike. Something that Mastodon is already mostly known as.

What scares me more than Mastodon going its own way is GoToSocial and its obsession with privacy on a network solely based on trust. I learned from silverpill that they apparently already have some kind of reply restrictions that make you unable to reply (alá Xitter) and other non-sense like that. Combined with all the authenticity and proof of origin work that silverpill has been working on, I think that this can have a much bigger impact, if it ever gets implemented broadly.


Imagine a Fediverse, where your reply can get effectively deleted, because someone remote doesn't like it and can remove that reply from their representation of a replies Collection, meaning that other server wouldn't know about it unless they specifically federate with you. Now add a function to that, that makes your reply go away even on those servers, because they respect the other remote server. Or imagine a Fediverse where your press "Post" on a reply and you get "replies not allowed", or "your reply awaits approval" back from the remote server.
4
1
2
@phnt @waifu @tyil

> putting your server behind CF and enabling all the protections.

I don't think even that works. I forget who I saw do this but it was someone on fedi trying to scrape a Cloudflare site and all they did was add a loop that checked if the first character of the response was "<" and if so, the program would assume Cloudflare had stopped the req, sleep 100ms, retry.

> not someone that has more than 40 IQ and is somewhat determined.

Even archive.is uses HeadlessChrome and executes JS.

In fact (moving pic related) it's sometimes easier to just scrape archive.is/archive.org/Google Cache/etc.

> Basically the only way to keep your sanity is to don't care about it unless it starts giving your problems.

Yeah: you can't keep control of information once it has left your computer. It's basically DRM.

> Your fedi instance is probably already getting scraped in some way and you don't even know about it.

waifuism stuff was in the Cloudlab scraped data.

(It's funny, the guy--whoever he was--got shut down for ToS violations, right? But he had the MongoDB ports open to the world. So people that dumped his MongoDB data have the data that he paid to scrape, but he *doesn't* have that data.)
impostor.mp4
5
0
2
@phnt @tyil @waifu

> Cloudlab scraped data.

On that topic, the full dump is hard to pass around but if you know how to locate and install bsondump, you can have a look at the "processing_status" and "checked_at" fields to see which ones were scraped directly and when.

Every instance is in the dump, just not all of them got scraped directly. Took them a while because they were scraping followers/following lists.
instances.bson.bz2
0
0
2
@phnt @tyil @waifu

> you have to be extra careful for it to not cache your API queries, auth tokens,

Even if you don't, Cloudfed sometimes gets a wild hair up its ass and decides that every Poast user is now logged in as graf.
1
0
1

@p@fsebugoutzone.org @phnt@fluffytail.org @waifu@mai.waifuism.life @tyil@fedi.tyil.nl
Archive.today (and ghostarchive.org) do not use headless chrome, they use an actual Chrome instance (some sites can tell the difference).

1
0
2
@p @tyil @waifu That was a fun time. IIRC he was doing some scraping testing with his own auth token and CF decided it would be awesome to cache it.
1
0
1
@p @tyil @waifu
>I don't think even that works. I forget who I saw do this but it was someone on fedi trying to scrape a Cloudflare site and all they did was add a loop that checked if the first character of the response was "<" and if so, the program would assume Cloudflare had stopped the req, sleep 100ms, retry.

I haven't tried to scrape a site behind CF for a long time, but it used to be trivial and you could find code all over GitHub that could bypass it. I don't know how it's going right now for non-specialized people. If you know what you are doing, you'll get through it anyway. There's a whole "underground" industry that specializes in bypassing CF and alike blocks.
1
0
1
@waifu @phnt @tyil

> what if I block every user agent that isn't a browser?

archive.is uses HeadlessChrome, pretends to be a browser by its UA, and uses an army of US residential proxies. Try to stop them.

> I only want connections by users not bots

You can't really worry too much about it. Plus you get weird fuckers like me and I pretty reliably (and completely by accident) snap the tripwires and get declared a bot, while actual bots just stroll right past.

Basically, what you have to do is focus on actual problems: "This shit is flooding us, how do we stop the site falling over?" In some cases (not any instance on fedi), having them do some hashing can work. Authwalling works.
mgs_irl--a_cardboard_box.png
0
0
2
@earslash @phnt @tyil @waifu

> I think a lot of the sites using it have poor coding behind them that can't efficiency handle any sort of major traffic, bot or not.

I think they're mostly static blogs run by neurotics and they don't actually *have* a problem with bots.

> still surprised at the sudden intake of sites using it,

They inserted the rent-seeking maneuvers from the start. They were hoping it would happen.
1
0
1

@phnt
>Fediverse as a network can't survive a split where basically the Mastodon and the rest divide and are unable to talk to each other

We're already at that point. The Mastodon side of the Fediverse essentially exists behind a virtual Iron Curtain of their own creation; a digital Berlin Wall of sorts with things like #fediblock. Which, is fine; This instance, my instance, blocks most of the Mastodon network preemptively, leading us to almost entirely exist within the consciousness of non-Mastodon-Fediverse. And, it's a very comfortable place to be. There was nothing gained in federating with Mastodon and, conversely, nothing lost in basically cutting them from our network collectively.

>Imagine a Fediverse, where ……

Simply, instance software should not respect the wishes of remote instance software. Simple as. As far as I'm concerned this stuff only becomes an issue whereupon our own software begins respecting remote instance demands, which I would hope developers like Silverpill never implement. Don't respect blocks, don't respect post denials, don't respect remote deletion requests. All of that, if desired, can be handled locally on the remote server. That's their own problem. You should not ever be able to issue demands against my own server.

E.g.: my server can issue any actions it wants, and receive any actions it wants; Your server does not need to respect any of the actions issued from my server, and vice-versa. If your server does not want to accept posts from my server, or deletes from my server, or wants to delete posts from my server, that's fine……only insofar as it being handled locally on your server---you should not be able to effect my own instance, however. If you want my post removed from your timeline or thread? Fine. But, it ought remain on my own instance and others' own respective instances. You want to block me? Okay. But, I can still communicate with others included in the thread and continue the conversation.

If GoToSocial wants to adopt some hivemind Borg model as well, where any server gains control over remote servers, then so be it; They can do that if they wish. I don't know anybody worth following or caring about that runs GoToSocial and my general impression of the software is that it's exclusively utilized by people somehow worse than Mastodoners.

@earslash @tyil @p @waifu

3
0
4
@earslash @phnt @tyil @waifu

> Could kill off some human traffic though.

This is the thing that really irks me, most of the people using cargo-cult shit *want* traffic, so they usually don't block Google but they're afraid of OpenAI.
0
0
3

@p @phnt @tyil @waifu

I think they’re mostly static blogs run by neurotics and they don’t actually have a problem with bots.

To be completely honest this is my take as well. There is little to no reason why the Linux Kernel, one of the most resourced (if not the most resourced) FOSS organizations in the world, has difficulty finding servers/creating code that can handle the load without Anubis.

They inserted the rent-seeking maneuvers from the start. They were hoping it would happen.

It wasn’t until the viral thelibre article came out that people started mass adopting Anubis. It is possible the creator set themselves up for promotion

2
0
2
@adiz @earslash @tyil @phnt @waifu

> they all just blindly copy blocks over

The not-a-real-instance-canary values in the whatsisface blocklists *still* do pretty high numbers.
0
0
1

@phnt @waifu
Holly fuck this is stupid. Yeah, your hatred is definitely justified. Fuck this gay shit.

0
0
1

@phnt@fluffytail.org @adiz@mtl.jinxian.casa @tyil@fedi.tyil.nl @p@fsebugoutzone.org @waifu@mai.waifuism.life While on the topic of Mastodon keep in mind the default robots.txt on Mastodon is still just "GPTBot", despite the existence of numerous other major AI scraping bots and the addition of GPT-Search. Eugen can not merge a simple change that adds the new bots, making it almost useless. really shows the priorities and the mindset of Mastodon devs

3
0
1
@earslash @tyil @waifu @p
>To be completely honest this is my take as well. There is little to no reason why the Linux Kernel, one of the most resourced (if not the most resourced) FOSS organizations in the world, has difficulty finding servers/creating code that can handle the load without Anubis.

LKML kept falling over routinely and probably still does. As it either is an ancient pile of scripts that are completely unoptimized and/or they don't bother hiring competent people to maintain the hardware.

Both are ironic considering that the linux foundation has been investing like 70% of its revenue into AI and less than 10% into the kernel. And they could also fix their LKML performance problem by shoving it behind a large CDN. kernel.org already is partially behind Fastly, why couldn't LKML and other kernel services be. I guess the reason is money.
2
1
1

@earslash If you are a user of, or administrator of, Mastodon then you deserve the worst possible Fediverse experience and I hope only bad things for you. To quote Silky Johnston:

>I hate you. I hate you. I don't even know you, and I hate your guts. I hope all the bad things in life happen to you and nobody else but you.

@phnt @tyil @waifu @p

0
0
1
@phnt @adiz @earslash @tyil @waifu

> Pete thought (thinks) that Fediverse as a network can't survive a split where basically the Mastodon and the rest divide and are unable to talk to each other.

Well, a split between *nodes* it can obviously survive. That remark was about a big split where the devs of various AP implementations can't cooperate with each other and start to deliberately diverge with the intention of making it harder to federate.

> I think that we are already at that point, or very nearly it, and things continue to be more-or-less the same.

Well, GTS still isn't popular.

> What scares me more than Mastodon going its own way is GoToSocial

Yeah, they're obsessed with trying to control information that has left their computers. Akkoma had this problem when it existed.

> they apparently already have some kind of reply restrictions that make you unable to reply

Yeah, this is accurate. I mean, you saw the fedilist blog post, right?

> Imagine a Fediverse, where your reply can get effectively deleted, because someone remote doesn't like it and can remove that reply their representation of a replies Collection, meaning that other server wouldn't know about it unless they specifically federate with you.

Hell is in sight.
2
0
4
@earslash @tyil @waifu @p @adiz It's not like search engines respect robots.txt anyway. Google and Yandex do, but Bing does not. Bing's webmaster knowledge base tells you that if you block using robots.txt, you'll get indexed, because they depend on the meta tag being present and the robots.txt "prevents them form seeing it" or whatever. So if you don't want to be indexed by Bing, you have to actually allow everything for Bing specifically and either configure the meta tag in your web server, or your application accordingly.
1
0
2
@earslash @phnt @tyil @waifu

> do not use headless chrome, they use an actual Chrome instance

I don't know how HeadlessChrome works, but Headless Firefox is just "Firefox with a virtual framebuffer rather than a screen" last I checked. If you fake out the webgl calls, it shouldn't be hard to pretend to be a real boy. How certain are you about archive.today/ghostarchive.org using regular Chrome?
1
0
1
@phnt @tyil @waifu Ha, worse, he woke up and it was doing that. Depending on where you landed, you were graf or one of two other accounts, I think one of them was pinemarten.
1
0
0
@phnt @tyil @waifu

> There's a whole "underground" industry that specializes in bypassing CF and alike blocks.

I should just do that for a living.
0
0
1
@j @phnt @tyil @waifu I loved that movie. I think I've seen it three times.
1
0
0
@p That weird meta casting of Robert Downey Jr. as a drug addict at his career low...
1
0
1
@judgedread I spent the first half of the movie going "That fucker needs to get got".
0
0
0
@adiz @earslash @tyil @p @waifu
>As far as I'm concerned this stuff only becomes an issue whereupon our own software begins respecting remote instance demands

Then you'll get blocked and left being unable to get their posts thanks to singed fetches. Once mentality like I described gets more prevalent, you can't stop it. Signed fetch can be bypassed somewhat easily, but if a large portion of Fediverse changes into a "I respect remote server and it's view of the world for replies/quotes/whatever" based on the multiple FEPs that are already in a draft, or complete stage, it's essentially over. You've fractured the network beyond repair. You've created a censorship first decentralized protocol. I've said something similar to him once and he faved it, so I hope he realizes that he's playing with fire, especially since he's been on Fedi for years now.

>you should not be able to effect my own instance, however. If you want my post removed from your timeline or thread? Fine. But, it ought remain on my own instance and others' own respective instances.

https://codeberg.org/fediverse/fep/src/branch/main/fep/7458/fep-7458.md

>I don't know anybody worth following or caring about that runs GoToSocial

I know two people on here that use it and I barely interact with both of those.

>my general impression of the software is that it's exclusively utilized by people somehow worse than Mastodoners.

Correct. The GTS devs blocked Moon from their GitHub after they learned who he was even though he was actually trying to help someone in the bug tracker. They are the Fediverse equivalent of fun police imo.
1
2
5
@adiz @phnt @earslash @tyil @waifu

> instance software should not respect the wishes of remote instance software.

Yeah: mail server. You code defensively. It works like mail servers: you have a protocol, you pass messages back and forth. Most of the people writing fedi software didn't bother to learn the lessons.

> my server can issue any actions it wants, and receive any actions it wants; Your server does not need to respect any of the actions issued from my server, and vice-versa.

This is basically the core of all networked software.

> If GoToSocial wants to adopt some hivemind Borg model as well, where any server gains control over remote servers, then so be it; They can do that if they wish.

I cannot wait for this to backfire.
0
0
2
@p @earslash @tyil @waifu @adiz

>Yeah, this is accurate. I mean, you saw the fedilist blog post, right?

Yes, I did. I even read the issues about it on their bug tracker before that post. It's willful ignorance at that point where you poison data by design and then proudly display the same stats on /. If I was you, I would just scrape the stats from / instead of getting through same API or whatever. And only out of spite.
2
0
1
@earslash @phnt @tyil @waifu

> There is little to no reason why the Linux Kernel, one of the most resourced (if not the most resourced) FOSS organizations in the world, has difficulty finding servers/creating code that can handle the load without Anubis.

Friend of mine is one of the guys hacking on the decentralized LKML mirrors. (They're using IPFS; I will have to ping him when I'm done with my code.)

> It wasn't until the viral thelibre article came out that people started mass adopting Anubis.

I'm not aware of the article but I believe it. Same reason chodejs took off.
0
0
2
@p @adiz @earslash @tyil @waifu

>Well, GTS still isn't popular.

It's getting popular in certain circles that like Sharkey and post weird images, but enough said. I don't want to turn this thread into one of those again.
2
0
2
@earslash @phnt @tyil @waifu @adiz

> the default robots.txt on Mastodon is still just "GPTBot", despite the existence of numerous other major AI scraping bots and the addition of GPT-Search.

It was fine when they were building the surveillance state, but now they're building chatbots: the Mastodong cannot tolerate the chatbot.
1
1
3
@phnt @earslash @adiz @tyil @waifu

> Google and Yandex do, but Bing does not.

Bing and Yandex seem to actually respect the crawl-delay; Google refuses to acknowledge it and won't let you tweak timing information without signing up for Webmaster Tools.

I don't know what Bing does for indexing because I don't think I've ever used their search engine on purpose. (Except their image search had more porn for a while.)
0
0
1

@p@fsebugoutzone.org @phnt@fluffytail.org @tyil@fedi.tyil.nl @waifu@mai.waifuism.life

Headless Chrome doesn't output to a screen at all, the "headful" chrome they use outputs to a virtual framebuffer (similar to headless Firefox with your description). Because Headless chrome didn't output to anything it was possible to detect when it was running. For example, a certain variable relating to graphics would be different, or a driver would be missing from the headless Chrome. There were programs that could "patch up" the discrepancies from the webpage through the browser instrumentation - but there would always be more 'telltale signs' than what the programs could hide. Also, they were able to archive sites that were known to block Chrome headless browsers.

https://freeman.vc/notes/headfull-browsers-beat-headless has a bit more information on the differences

1
1
3
@phnt @adiz @earslash @tyil @waifu

> The GTS devs blocked Moon from their GitHub after they learned who he was

God*damn*.

Glad my logorrhea saved me from getting blocked on there. A handful of paragraphs in, I was like "This is not gonna even fit in the jithub comments."
0
0
1

@phnt
>Sharkey

The curse of Syuilo will take them all. Fork the best, die like the rest. @earslash @tyil @p @waifu

0
0
2
@phnt @adiz @earslash @tyil @waifu

> If I was you, I would just scrape the stats from / instead of getting through same API or whatever. And only out of spite.

I just mark the data-poisoning instances as "malicious" in the DB.
0
0
0
@p @earslash @phnt @tyil @waifu

>since you can just change the UA.
That's just the default configuration.
Anubis is an anti ddos component, a cloudcuck alternative. The authors chose this default for easier adoption.

It is not a novel idea. Pretty sure I have an early implementation somewhere using nginx lua. That's where josh from the farms found it and started using it during the ddos. It was one of the very few things that made a significant difference. That and tor hosting. I think he also tried to sell it as a service later.

Given how enbedded cloufed is and how exposed a site becomes without it, I'd say it is a good development that this kind of software gains popularity.
2
0
1

@laurel@fsebugoutzone.org @p@fsebugoutzone.org @phnt@fluffytail.org @tyil@fedi.tyil.nl @waifu@mai.waifuism.life KiwiFlare only shows up once per session. Anubis shows up every 3rd time i am navigating a site with Anubis enabled. It seems to me KiwiFlare is better designed than Anubis though they have different goals.

2
0
2
@phnt @adiz @earslash @tyil @waifu

> It's getting popular in certain circles

Ha, once that group all switches to GTS, I will have twice as many reasons to never enable signed fetches.
0
0
0

@p @earslash @tyil @phnt @waifu @adiz
> Hell is in sight.

does depend on the people though? i don't think NAS or dobbs.town would flip the switch on this kind of functionality to "on". the people who do like this hugbox stuff already import the blocklists and defacto created their own part of fedi - if they want remote permission to post things it's just another way for them to kneecap themselves.

2
0
2
@bonifartius @earslash @tyil @phnt @waifu @adiz

> i don't think NAS or dobbs.town would flip the switch on this kind of functionality

Or GLC. Those three are on the "Oh, shit, right, they're using *masto* for some reason" list.

> the people who do like this hugbox stuff already import the blocklists and defacto created their own part of fedi

And now they have another infighting method. It's going to be great.
0
0
2
@earslash @phnt @tyil @waifu

> Because Headless chrome didn't output to anything it was possible to detect when it was running.

Ah, yeah, I can see that making a difference.

>

Headless Chrome doesn't output to a screen at all, the "headful" chrome they use outputs to a virtual framebuffer (similar to headless Firefox with your description). Because Headless chrome didn't output to anything it was possible to detect when it was running. For example, a certain variable relating to graphics would be different, or a driver would be missing from the headless Chrome. There were programs that could "patch up" the discrepancies from the webpage through the browser instrumentation - but there would always be more 'telltale signs' than what the programs could hide. Also, they were able to archive sites that were known to block Chrome headless browsers.

> https://freeman.vc/notes/headfull-browsers-beat-headless has a bit more information on the differences

bigbosssalute
1
0
1
@earslash @phnt @tyil @waifu ...Goddamn that post was a mess. Disregard; intended contents were "Ah, I see" and "Thanks".
0
0
2
@bonifartius @p @earslash @tyil @waifu @adiz The Pleroma Fedi is and probably always will be a nice place. Since basically every bigger instance is using a fork of it anyway and a lot of the admins are here longer than the hugbox/you have the wrong opinion people from the mastodont/GTS side. Maybe it will make more people using instances running on those aware of the madness. If I stop following like three people on the Pleroma side of Fedi, all the broken threads I see would go away. But those people also make interesting posts, so it's always a balance of I'm annoyed, but not enough.
1
0
2
@laurel @earslash @phnt @tyil @waifu

> It is not a novel idea.

Yeah. A friend on fedi whose software is not public yet has a better implementation and it doesn't uwu-furry you.

> Given how enbedded cloufed is and how exposed a site becomes without it, I'd say it is a good development that this kind of software gains popularity.

It's a catastrophe that also doesn't work.
1
0
1

@phnt I kinda wonder how much of the Mastodon user-base went away with the popularity gain in BlueSky. I've always suggested that Mastodon should just abandon ActivityPub entirely or move purely to whitelist federation because of their behavior. @earslash @tyil @p @waifu @bonifartius

3
0
3
@earslash @phnt @tyil @waifu @p

Anubis has a very extensive configuration, I'm sure frequency can be set.
I just don't think Josh is a good programmer. He is responsible for the whole database getting stolen because he thought he'd program a chat component in (safe) rust. The nature of the bug, him failing to do input validation, doesn't help.
2
0
2
@earslash @phnt @tyil @waifu @laurel

> KiwiFlare only shows up once per session. Anubis shows up every 3rd time i am navigating a site with Anubis enabled.

To be fair, it used to show up so often that you couldn't open a full page over Tor: loading the avatars would chew up your allotment.

zerocool Please, sir, may I have some GETs?
hal9000 Here's a page with a thousand images on it. You won't see the images, just they'll fail to load.

It is better nowadays. If I look at their site, I can view an entire page without a catastrophe. I complained at Crunk but I don't know if he took my complaint seriously or used any of the ideas I tossed at him or if someone else complained or if he was already working on a solution, I just know that at some point it got way better.
1
0
0
@adiz @earslash @tyil @p @waifu @bonifartius
>I kinda wonder how much of the Mastodon user-base went away with the popularity gain in BlueSky.
Not enough.

>I've always suggested that Mastodon should just abandon ActivityPub entirely
Misskey is planning doing that with a ProtoBuf protocol. At least as a secondary inter-Misskey protocol.

https://github.com/misskey-dev/xq
0
0
1
@phnt @waifu @p @tyil

I am very interested in browser fingerprinting :)
The advanced methods use undocumented js features to infer processor model, ram and other hardware characteristics. There is no real countermeasure to that.
What sophisticated netizens employ is remote controlling browsers that run on a phone or a Mac. That way the hardware looks the same. Think Chinese phone farms or how online.net offers Mac mini timeshares.

Still, certain sites push it further by analyzing behavior across all the points they have inputs. So g*ogle for instance, will monitor your mouse movements, your search terms, how often it sees your fingerprint, etc and use all of that data to score you for captcha v3
1
0
2
@laurel @phnt @tyil @waifu

> The advanced methods use undocumented js features to infer processor model, ram and other hardware characteristics. There is no real countermeasure to that.

It's fun shit!

> That way the hardware looks the same. Think Chinese phone farms or how online.net offers Mac mini timeshares.

brandt
0
0
0
@laurel @earslash @phnt @tyil @waifu

> I just don't think Josh is a good programmer.

He's doing a thing on the side, the code is just a means to an end. I think a lot of the heavy lifting is done by Crunk, but they can't hire out because they rely on their insularity.
1
0
0
@earslash @phnt @tyil @waifu @p

Yeah, that's the lua implementation I mentioned earlier.
0
0
1

@p@fsebugoutzone.org @laurel@fsebugoutzone.org @phnt@fluffytail.org @tyil@fedi.tyil.nl @waifu@mai.waifuism.life

The creator of Anubis claims there is "bait" in Anubis which AI companies are taking
Take of this what you will - in my opinion its a 100% bluff considering it is easily bypassable by anyone

1
0
1
@p @earslash @tyil @phnt @laurel @waifu people getting their databases melted when there is literally a query system there to safely yeet parameters on the side has always been some embarassing shit. PHP programmer level nonsense.
1
0
1

@adiz @phnt @earslash @tyil @p @waifu @bonifartius a lot of the people using bsky are people who hate mastodon because of the shit mastodon instances pull (which you never see around here) but also because of that and "problematic" people on this side of the internet, they want to be in the cool kids club.

If you're not on a instance that blocks, you can feel like Patrick Bateman in the business card scene, able to Link Up (tm) to all the techbros to get yourself a feature because you linked up to the right person and made something foss.
arstechnica.com/gadgets/2022/05/microsoft-open-sourced-the-code-for-1995s-3d-movie-maker-because-someone-asked/

3
0
2

@adiz @phnt @earslash @tyil @p @waifu @bonifartius same eceleb same post on both sites

take a guess on why that crowd moved over to bsky

2
3
6
@sendpaws @earslash @tyil @p @waifu @adiz @bonifartius That's a good thing. Genuinely, if you are on soycial media for upcummies, go away from fedi.

Every time I saw a foone post, it was something weird that I could barely parse. First time I learned who foone was from, I kid you not, an NCommander stream.
2
1
4

@earslash @phnt @bonifartius @tyil @waifu @p @adiz part of why I'm into PC98 stuff is it repels some of the worst people in that so called community I mean some are into it.......but they seem to have a melty about Japanese video games. Turns out being a ex-$10 forum poster rots your brain and makes you feel some type of way about them.

1
0
3
@adiz @tyil @phnt @p @waifu i havent had issues with anubis after a recent update (i assume) that made jt work fine for me across all sites that use it
0
0
0
@phnt @tyil @p @waifu couldn't you also use a browser to load a page then pass the loaded stuff onto the scraper thing?
1
0
1

@phnt @earslash @tyil @p @waifu @adiz @bonifartius foone is literally a typical "retro tech eceleb" (annoying, half the shits wrong) but formatted for Twitter retweets.

Which is the point about bsky I'm making too, the Twitter "cool kids" crowd moved there when they could no longer do the same shit they did back in the day with zero opposition.

2
0
2
@mischievoustomato @tyil @p @waifu Of course you can. It's like offloading captchas to some Indians.
0
0
2
@phnt @earslash @tyil @p @waifu @adiz well, it's gonna happen as people do want that. is it good? nope, but it's gonna happen
0
0
1
@sysrq @phnt @waifu write a shell alias that injects rando ua-sig as env var. pass as arg to curl. enjoy egypt
0
0
1

@mrsaturday @earslash @tyil @phnt @p @waifu @adiz @bonifartius remember if you're an ex "internet hate site" janny, you can wipe the sins away by eating whoppers and big macs

1
0
0
@sendpaws @earslash @tyil @p @waifu @adiz @bonifartius Action Retro is the personification of everything wrong with the retro Apple community.
1
0
1
@adiz @earslash @tyil @phnt @p @waifu if the server software doesn't respect the behavior of another, that'll just kick off a server war in which pleroma will probably be in the losing side or will eventually capitulate
1
0
0

@phnt @earslash @tyil @p @waifu @adiz @bonifartius people shit on 8 bit guy but he actually writes code for old computers

action "abandoned fedi for twitter" retro: WOW 😮 I bought an Apple 🖥️🍎 ACCELERATOR CARD 🚀 WOAHHHHH 🤯🤯🤯

1
0
5

@mischievoustomato There will be no "war". You're just going to see the network schism between those who want to have fun vs. those who want to control everything and everyone. Which, is basically the contemporary status quo anyway. So, I don't really think that much will change in general. @earslash @tyil @phnt @p @waifu

0
0
2
@sendpaws @earslash @tyil @p @waifu @adiz @bonifartius
*swings arm in front of the camera for the 20th time in 5 minutes*
1
1
2

@phnt @earslash @tyil @p @waifu @adiz @bonifartius codewarrior mac programming? nah fuck that shit we're gonna install Limp Bizkit Linux and stare at the desktop

0
0
1
@adiz @earslash @tyil @phnt @p @waifu @bonifartius These people live for the fight, for the negativity, for attacking that deplorable "other". Going to Bluesky would deprive them of that, at least until they start fracturing and purity spiraling. They need us, but we sure as hell don't need them.
2
1
5

@p @earslash @tyil @phnt @waifu @adiz I'll never understand why people are so fervent on blocking AI user-agents.

I could maybe understand it if the rationale was traffic/load, but it really does appear to be "le ai bad" which has been rolled into the hivemind shit along with "le free speech bad"

But ultimately, all you're doing is ensuring that your message, the message you supposedly wanted out on the Internet for people to interpret and maybe accept, will not be included in the model and therefore cannot influence its output.

My Ekko lore archive gets a *lot* of traffic from OpenAI and Anthropic and I'm very happy about this because that means I'm in a position to countermessage TV Show before the LLM even begins producing a response to the user query.

2
0
2

@sendpaws
>fewer people tell me to kill myself here

Didn't BS just experience a sort of cultural disruption when administration decided that people couldn't tell other people to commit suicide anymore? @mrsaturday @earslash @tyil @phnt @p @waifu @bonifartius

3
0
1
@adiz @earslash @tyil @phnt @p @waifu @sendpaws @bonifartius The risky part about licking a boot 24/7 is your teeth are really close if you annoy the person wearing it.
Total Bluesky Death
1
0
1
@p @earslash @tyil @phnt @laurel @waifu Everyone bitched at Josh about it, but he was literally writing it in the middle of 2 ongoing DDOS attacks. It got better once the rotating reverse proxies were in place and he could scale it back. Now it only triggers roughly every 24 hours for me.
1
0
0
@PunishedD @earslash @laurel @phnt @tyil @waifu

> but he was literally writing it in the middle of 2 ongoing DDOS attacks.

Well, hacking on someone else's code, not writing it from scratch.
0
0
1
@earslash @phnt @tyil @waifu @laurel

> in my opinion its a 100% bluff

Yeah, seems that way.
0
0
0
@sendpaws @earslash @tyil @phnt @p @waifu @adiz @bonifartius The bathtowel tacked to the wall makes this a trailer trash masterpiece
1
0
1
@icedquinn @earslash @laurel @phnt @tyil @waifu

> there is literally a query system there to safely yeet parameters on the side has always been some embarassing shit.

I have never introduced a goddamn SQLi bug, I don't know how people fuck up that badly.
0
0
0

@mrsaturday @earslash @tyil @phnt @p @waifu @adiz @bonifartius that's to cover the rage hole he punched in his wall after raging on stream

0
0
1
@sendpaws @adiz @phnt @earslash @tyil @waifu @bonifartius

> same eceleb same post on both sites

Dear god, he's doing ~*~CoNtEnT~*~.
0
1
1
@earslash @phnt @bonifartius @tyil @waifu @sendpaws @adiz I have seen some of his posts, they were fun. I probably just ignored it if there was anything political.

It is kind of interesting that he is gayflaggin' on busky but not on fedi.
0
0
0
@phnt @sendpaws @adiz @bonifartius @earslash @tyil @waifu

> Genuinely, if you are on soycial media for upcummies, go away from fedi.

They are called "updoots" and they are to be Respected on the Mastadon Network.



(But also: seconded, you are completely correct.)
3
0
3
@sendpaws @earslash @phnt @bonifartius @tyil @waifu @adiz

> but they seem to have a melty about Japanese video games.

I have a whole theory about that.
0
0
0
@sendpaws @phnt @earslash @tyil @waifu @adiz @bonifartius

> foone is literally a typical "retro tech eceleb" (annoying, half the shits wrong) but formatted for Twitter retweets.

I never paid attention to people like this on Twitter, I didn't know this was a known person. I see a bunch of weird shit, people babbling about computers. His account looked read-only/broadcast but I didn't realize it was crossposted to Twitter (Elon Editor) or Twitter (Rental Edition).

> the Twitter "cool kids" crowd moved there when they could no longer do the same shit they did back in the day with zero opposition.

That's the main benefit of busky.
0
0
2

@p @phnt @tyil @waifu ok, that’s something I need to look out for at work maybe, is there some documentation on this issue?

1
0
1
@maxburn @phnt @tyil @waifu You can probably just ask graf, but it was more or less that. Goes to bed, wakes up and suddenly a bunch of people are sending him DMs as himself saying "I'm in your account". They must have changed something, or it was possibly manual as Cloudfed has been known to occasionally take advantage of their global-reach MitM.
0
0
0
@r000t @earslash @tyil @phnt @waifu @adiz

> I'll never understand why people are so fervent on blocking AI user-agents.

50% fear of the unknown, the other 50% is that ChatGPT shames them by writing better posts.

> it really does appear to be "le ai bad"

Well, you remember when people flipped out that Microsoft used Github to train Copilot, right? It seriously is just that.

> I'm very happy about this because that means I'm in a position to countermessage TV Show

Ha, that's awesome.
0
0
0
@sendpaws @mrsaturday @earslash @tyil @phnt @waifu @adiz @bonifartius

> fewer people tell me to kill myself here.

Fewer people there, I think.

> and i learn new stuff about how people have sex every day

alexjonesconfuse
0
0
0
@p @earslash @tyil @phnt @waifu @sendpaws @adiz @bonifartius
idk almost all social media creates some sort of feedback meta that can potentially be addictive or program you into a worse version of yourself.
Even on imageboards replies create feedback loops, that's why contrarianism is so prevalent (tho to be fair it's a better feedback loop then updoots because it can encourage creative and unorthodox thinking, while likes and favs and all that reward compliance and agreeableness)
1
0
0
@dagda @adiz @bonifartius @earslash @phnt @sendpaws @tyil @waifu

> idk almost all social media creates some sort of feedback meta

There are people that do engagement-whoring and this is a thing that Twitter already has and it's awful. Nobody does that shit on IRC. Nobody does that shit when talking to their friends at a bar. I mean, sure, you tell a joke, you want your friends to laugh, but it's miles from there to "I am an empty vessel for the Content." You're not trying to game a system to get money and fame. People that *are* trying to game fedi to acquire money and influence should fuck off from fedi, I stand by that completely. #mutualgrids
1
0
2
@fristi @adiz @bonifartius @earslash @mrsaturday @phnt @sendpaws @tyil @waifu This is severely off-topic but I spent like 20 minutes yesterday trying to convince the cat to stop trying to fuck up a dog for walking into the yard. Dog was very sorry.
70A3D406-4EF1-41E8-83A1-A71F39B816C5.jpeg
0
0
2
@p @earslash @tyil @waifu @sendpaws @adiz @bonifartius Luke Smith. He's a meme in certain Linux circles for his sometimes weird takes on topics ranging from software to philosophy and religion.

He started by doing LaTeX tutorials while embracing software minimalism when doing his Master's in linguistics running Arch, later Void Linux and then Artix. As time progressed he canceled his Internet connection at home and instead walked to campus if he needed Internet for something. He's best known for his ramblings about almost anything in woods. Later he moved to the middle of nowhere with a barely working Internet connection and that's where the Luke Smith Pipeline was born (embrace minimalism, get frustrated at modern computers, move to the middle of nowhere, make 5 videos and vanish from the Internet for a year). That was his "peak" kawntent and from there he periodically uploads a few videos and then vanishes for a year.

He has a Peertube instance at videos.lukesmith.xyz and an old podcast at notrelated.xyz.

My favorite quote is probably: "I'm way to stupid to use Ubuntu." When he was setting up something, refused to use Docker and Ubuntu didn't have new enough packages for something.
luke-smith-georgia-stones.jpg
luke-smith-proprietary-sink.mp4
2
0
4
Actual good solution, cheers for these! :D
0
0
1
@phnt @adiz @bonifartius @earslash @sendpaws @tyil @waifu

> Luke Smith.

Okay, he's on fedi, right, I swear I have talked to that guy.

> My favorite quote is probably: "I'm way to stupid to use Ubuntu." When he was setting up something, refused to use Docker and Ubuntu didn't have new enough packages for something.

I feel like I could party with this guy.
2
0
2
>Even if you automate "random" UAs, I can put in a pretty excessive UA blacklist with patterns if I so desire.


Here's a list of 46,000 browser-specific user-agents which contain no variant of the word "mozilla."
0
0
1
@dagda @adiz @bonifartius @earslash @phnt @sendpaws @tyil @waifu I mean, while we're on the topic, I am moderately embarrassed that I didn't expect Nostr to be overrun by bots the second I heard about "zaps".
0
0
2
I don't know how true this is, but I've heard that it was written by an AI. Which is really ironic
0
0
0
@phnt @earslash @tyil @p @waifu @sendpaws @adiz @bonifartius >My favorite quote is probably: "I'm way to stupid to use Ubuntu." When he was setting up something, refused to use Docker and Ubuntu didn't have new enough packages for something.
*bell curve meme*
0
0
1
@p @tyil @phnt @waifu I've only ever watched it while extremely high on DXM
1
0
0
@p @phnt @adiz @bonifartius @earslash @sendpaws @tyil @waifu

You would have little issue communicating with Luke, arriving at direct and blunt honesty is identical if the motivation is autism or orthodox Christianity.
1
0
2

@phnt @Inginsub @waifu so it doesn’t solve scraping at all? What’s the use case then? I was considering putting it on my server.

2
0
0
@sarvo @Inginsub @waifu It mostly solves opportunistic scraping, but someone even slightly determined will just get right through it. Most US-based AI company scrapers will completely bypass it, so unless you are having actual performance issues due to scraping, probably don't bother.
1
0
1

@phnt @Inginsub @waifu more than performance I don’t want my posts or my users post’s to end up being datamined. What would be a good approach for that goal?

1
0
0
@sarvo @Inginsub @waifu There basically isn't one other than watching the logs periodically for suspicious activity. Those that already scrape Fedi aren't that dumb and masquerade as Mastodon or some other Fedi server which would completely bypass Anubis.

You are on a federated network and cannot control how the data spreads beyond your own instance. Posts from your instance show on publicly accessible timelines on other servers and whatnot, where they can be easily scraped without you ever knowing. The best you do, really, is to disable access to accounts, timelines for unauthenticated users on the UI side and that's about it for preventative measures. If Misskey can even do that. That said, completely disabling access for unauthenticated users to posts (threads) might annoy some that might use it to view posts that haven't federated to their instance.

_Treat everything you post like public information (even followers-only posts) and your users should do the same. Privacy barely exists on this network._

I don't have other recommendations beyond checking your logs for suspicious requests (more frequent than they should be to the same endpoint), checking those IPs and if they aren't a legitimate instance, just nullrouting them in the firewall. Putting Anubis on your UI API endpoints might block some scrapers, but using it globally will do nothing since you can still get posts from elsewhere and without a browser user-agent. Otherwise the federating protocol wouldn't be able to work.
1
0
0

@sarvo @Inginsub @waifu

but using it globally will do nothing since you can still get posts from elsewhere and without a browser user-agent. Otherwise the federating protocol wouldn't be able to work.

Check this out, here are your public posts even if you would completely disable public access to Misskey's API:

curl -LH "Accept: application/activity+json" "https://novoa.nagoya/users/8ukmmetqq5/outbox?page=true"

How did I get there?

1. https://novoa.nagoya/.well-known/webfinger?resource=acct:sarvo@novoa.nagoya in a browser
2. copy the URL from href field in "0" field in the links array
3. curl -LH "Accept: application/activity+json" "https://novoa.nagoya/users/8ukmmetqq5"
4. find the outbox field
5. curl -LH "Accept: application/activity+json" "https://novoa.nagoya/users/8ukmmetqq5/outbox"
6. copy URL from the "first" field
7. curl -LH "Accept: application/activity+json" "https://novoa.nagoya/users/8ukmmetqq5/outbox?page=true"
8. Profit
0
0
0

@r000t

I could maybe understand it if the rationale was traffic/load

That is my rationale, at least. I only deploy Anubis on cgit because that’s the only service I host that goes OOM whenever it gets several thousand requests per second. My blog is a completely static site, it can handle that kind of load from LLM scrapers, so I have no need to try and limit the amount of connections there.

0
0
0
@tyil @phnt @waifu And I just realized that my UA being "YOUR AD HERE" is probably why I never see the Anubis thing.
1
0
1
@p @tyil @waifu How many sites refuse to give you the actual page, because they think your browser is "unsupported".
1
0
1
@tyil @phnt @waifu Fewer than you'd expect if you think most people are retarded, more than you'd expect if you expect coders to not be *too* stupid. Most coders that are in a position to implement browser-sniffing think it's retarded and most managers don't know enough to ask for it, so it's more often that some frontend dingus tries to detect the browser and hoses it. I just disable their JS and read the page.
0
0
2