we're headed towards a really scary future where Google gets to define what the web is https://ferdychristant.com/the-state-of-web-browsers-f5a83a41c1cb
One reason why google, duckduckgo etc. are able to deliver results quickly is because they throw MASSIVE amounts of hardware at the problem. Even most searx instances are slow as heck and that's not even running a search algorithm or web crawler. PageRank and similar algorithms are a lot more complicated than simple text search, even more so if you're also running a crawler.
@alcinnz @Shamar @bob @xj9 @drwho @mangeurdenuage @enkiv2 @yogthos One thing that might be interesting: you could pretty easily make a search engine that runs as a local addon/application. It would "crawl" the web by keeping track of what links you click. So if you click a link from page "blah" to page "yadda" with text "blargh", there would be a little entry "blah is related to yadda by blargh". Then other users could import your search db into their local search engine.
I know that you can configure YaCy as an HTTP proxy and have it index everything you browse. I haven't done this yet, though.
@popefucker @alcinnz @Shamar @bob @xj9 @mangeurdenuage @enkiv2 @yogthos DHT seems more effective for connecting nodes with each other than, say, an IRC channel on EFnet that nodes announce themselves on.
Searx - optimizing the TCP/IP stack helps somewhat. That's what I did and it's helped immensely.
Algorithms - This! So much this! People don't seem to understand how arcane and complex they are. Textual analytics for search require fuckloads of processing power.
As somebody who works for a (minor) search engine company, I can verify that federation & homogeneity in search will work -- although performance penalties may pop up.
I know this because we ran our searches against ~40 differently-sized machines in a data center until a few months ago, and those machines were on average only like 3x as beefy as a beefy consumer gaming machine.
@enkiv2 @Shamar @alcinnz @bob @xj9 @mangeurdenuage @yogthos It's one of the trade-offs that has to be made when you don't have entire data centers full of tens of thousands of nodes three hops apart from each other. We as a community can't compete in that way. But we can do something different, and improve what we have.
@bob @xj9 @yogthos @mangeurdenuage @enkiv2 Large amount of storage (and I/O capacity) - yes. Inverted indices and word vectors are big. Lots of RAM - to be fast, you at least want the top-level indices in RAM as much as possible. Results not very good - unfortunately, YaCy isn't Google. There isn't a planet's worth of R&D money being thrown at YaCy.
I was thinking of an API where the engine could ask the site for any content matching the query. This could handle propagation as well.
For example, if I want to search for a particular text on Mastodon, and the server could ask the servers it federates with, etc.
@yogthos no way
@arjen it's definitely time to take action and stop using chrome whenever possible
@yogthos yes fully agreed! I switched to Firefox on desktop and mobile, very pleased with the UI/UX.
@arjen and I actually find FF mobile is superior to Chrome on mobile in pretty much every way
@yogthos think so too. Most useful with synching, though I still have to sort out how to do that myself standalone rather than through Mozilla.
Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. This server is run by the main developers of the Mastodon project. Everyone is welcome as long as you follow our code of conduct!