we're headed towards a really scary future where Google gets to define what the web is

@xj9 @mangeurdenuage completely agree, and I think federation could get us there. Would be interesting to come up with a federated search engine, maybe something similar to ActivityPub but focused on searching.

@xj9 @enkiv2 @mangeurdenuage @yogthos YaCy is definitely not meant to run on a NAS. Search is a highly complex thing, and complexity of code is concomitant.

Said writeup:

@xj9 @drwho @yogthos @mangeurdenuage @enkiv2

I havn't tried adding YaCy on #freedombone for similar reasons. Everything I had read about it said that the results just weren't very good.

So I guess there's a gauntlet to be hurled around. Is it possible to make a p2p search engine? There are a lot of things to be searched, so I expect a large amount of storage would be needed.

@bob @xj9 @drwho @mangeurdenuage @enkiv2 @yogthos This question is what excites me.

So far my conclusion is "not as long as we insist on having one all-powerful search engine".

I thought that was what differentiated it from Archie?

Have I gotten confused?

@xj9 @yogthos @bob @mangeurdenuage @drwho @alcinnz


I think so

There are a lot of local installations but they don't exchange crawl data IIUC. Which would be my understanding of federation.

@alcinnz @drwho @mangeurdenuage @bob @yogthos @xj9

@alcinnz @Shamar @bob @xj9 @drwho @mangeurdenuage @enkiv2 @yogthos yeah. It's probably constrained by the use of a DHT.

One reason why google, duckduckgo etc. are able to deliver results quickly is because they throw MASSIVE amounts of hardware at the problem. Even most searx instances are slow as heck and that's not even running a search algorithm or web crawler. PageRank and similar algorithms are a lot more complicated than simple text search, even more so if you're also running a crawler.

@alcinnz @Shamar @bob @xj9 @drwho @mangeurdenuage @enkiv2 @yogthos One thing that might be interesting: you could pretty easily make a search engine that runs as a local addon/application. It would "crawl" the web by keeping track of what links you click. So if you click a link from page "blah" to page "yadda" with text "blargh", there would be a little entry "blah is related to yadda by blargh". Then other users could import your search db into their local search engine.

@popefucker @alcinnz @Shamar @bob @xj9 @mangeurdenuage @enkiv2 @yogthos One used to exist, but I don't know what happened to it. Not many people used it, and eventually it wrapped up.

I know that you can configure YaCy as an HTTP proxy and have it index everything you browse. I haven't done this yet, though.

@popefucker @alcinnz @Shamar @bob @xj9 @mangeurdenuage @enkiv2 @yogthos DHT seems more effective for connecting nodes with each other than, say, an IRC channel on EFnet that nodes announce themselves on.

Searx - optimizing the TCP/IP stack helps somewhat. That's what I did and it's helped immensely.

Algorithms - This! So much this! People don't seem to understand how arcane and complex they are. Textual analytics for search require fuckloads of processing power.

As somebody who works for a (minor) search engine company, I can verify that federation & homogeneity in search will work -- although performance penalties may pop up.

I know this because we ran our searches against ~40 differently-sized machines in a data center until a few months ago, and those machines were on average only like 3x as beefy as a beefy consumer gaming machine.

@alcinnz @bob @xj9 @drwho @mangeurdenuage @yogthos

@enkiv2 @Shamar @alcinnz @bob @xj9 @mangeurdenuage @yogthos It's one of the trade-offs that has to be made when you don't have entire data centers full of tens of thousands of nodes three hops apart from each other. We as a community can't compete in that way. But we can do something different, and improve what we have.

@alcinnz @bob @xj9 @mangeurdenuage @enkiv2 @yogthos YaCy is peer-to-peer - it uses the Bittorrent DHT to find other federated YaCy nodes, and they pass search and indexing requests between each other (if you don't have YaCy in Robinson Carusoe mode) to make up a network of search nodes.

@bob @xj9 @yogthos @mangeurdenuage @enkiv2 Large amount of storage (and I/O capacity) - yes. Inverted indices and word vectors are big. Lots of RAM - to be fast, you at least want the top-level indices in RAM as much as possible. Results not very good - unfortunately, YaCy isn't Google. There isn't a planet's worth of R&D money being thrown at YaCy.

@xj9 @enkiv2 @mangeurdenuage @yogthos Searx is really nice (I use it for a couple of things), but it's a front-end to a bunch of search engines that we don't control (most of the time - you can point it at your own YaCy instance if you want).

Search and metasearch are two different things.

@xj9 @enkiv2 @drwho @mangeurdenuage @yogthos Yacy is rad and kool. a more sane implemention of the servers-to-server protocol would be kool

@enkiv2 @xj9 @mangeurdenuage @drwho I'm familiar with p2p search engines like YaCy, but these only work for static sites.

I was thinking of an API where the engine could ask the site for any content matching the query. This could handle propagation as well.

For example, if I want to search for a particular text on Mastodon, and the server could ask the servers it federates with, etc.

@enkiv2 @xj9 @mangeurdenuage @drwho basically it would be good to have searchability built into the network itself. This way you wouldn't need a complex external tool to trove it.

And if there was a standard API, all federated networks could start implementing it. :)

@yogthos @enkiv2 @xj9 @mangeurdenuage As I recall, that kind of searchability is not implemented in Masto deliberately.

@yogthos @enkiv2 @xj9 @mangeurdenuage That doesn't make any sense at first scratch - could you unpack the "only work for static sites" bit?

@arjen it's definitely time to take action and stop using chrome whenever possible

@yogthos yes fully agreed! I switched to Firefox on desktop and mobile, very pleased with the UI/UX.

@arjen and I actually find FF mobile is superior to Chrome on mobile in pretty much every way

@yogthos think so too. Most useful with synching, though I still have to sort out how to do that myself standalone rather than through Mozilla.

Sign in to participate in the conversation

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!