Inverting the Web 

We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.

Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.

@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).

Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.

My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...

@freakazoid ... and reputation.

But a mechanism in which:

1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known

... might disrupt things a tad.

Somewhat more:

NB: the reputation bits might build off social / netgraph models.

But yes, I've been thinking on this.

@enkiv2 I know SEARX is:

Also YaCy as sean mentioned.

There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.

Being dropped by Firefox BTW.

That provides a query API only, not a distributed index, though.

@freakazoid @drwho

@dredmorbius @enkiv2 @freakazoid YaCy isn't federated, but Searx is, yeah. YaCy is p2p.
@dredmorbius @enkiv2 @freakazoid Also, the initial criticism of the URL system isn't entirely there: the DNS is annoying, but isn't needed for accessing content on the WWW. You can directly navigate to public IP addresses and it works just as well, which allows you to skip the DNS. (You can even get HTTPS certs for IP addresses.)

Still centralized, which is bad, but centralized in a way that you can't really get around in internetworked communications.

@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.

If you request by IP, you'll get only the default / primary host on that IP address.

That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.

@freakazoid Question: is there any inherent reason for a URL to be based on DNS hostnames (or IP addresses)?

Or could an alternate resolution protocol be specified?

If not, what changes would be required?

(I need to read the HTTP spec.)

@kick @enkiv2

@dredmorbius @kick @enkiv2 HTTP URLs don't have any way to specify the lookup mechanism. RFC3986 says the part after the // and optional authentication info followed by @ is a "registered name" or an address. It doesn't say the name has to be resolved via DNS but does say it is up to the local system to decide how to resolve it. So if you just wanted self-certifying names or whatever you can use otherwise unused TLDs the way Tor does with .onion.

@freakazoid @dredmorbius @kick @enkiv2
Just to point out -- URLs/URIs are W3C specced but aren't part of HTTP. (You guys know this but it's important to make this distinction here.) HTTP URLs are always over HTTP & so can't be content-addressed -- they're always host-based. But you can stick an SSB, IPFS, or onion address in an HTML anchor tag.

@enkiv2 Also, "host" can be abused in all kinds of interesting ways -- a host that accepts search parameters and forwards to matching content, e.g.

(A search engine in "I'm feeling Lucky" mode, say.)

@freakazoid @kick

@dredmorbius @enkiv2 @freakazoid @kick
It can be, & web tech does. But they're all single points of failure. If there's host-based addressing at all, then there's always a machine that needs to stay up forever or else your data is inaccessible.

@enkiv2 It's also a transition path, which addresses another element of this question.

If we're looking at coming up with a DNS-independent addressing scheme, then operating a set of reflectors, relays, or gateways (similar to Usenet-Email, Usenet-Web, or Internet-BBC gateways), might offer a path.

The relays _might_ be an online infrastructure, including a distributed one (in both IP and namespace) _or_ a locally-provisioned one as an HTTP or Tor proxy.

@freakazoid @kick


@dredmorbius @enkiv2 @freakazoid @kick You can try to fight this way, but you're loosing your time. A global sihft of paradigm in terms of cybersoace architecture is necessary. Still, by the mean time, we can. find clever trucj to fuck them but, according to me, this should never distractvus from building our own standards and alternative. cyberspace architecture. We're ahead microsoft. In termscof concepts. Never loose that of sight.

Sign in to participate in the conversation

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!