12mo ago

SearNGX should be a federated search engine

github.com GitHub - searxng/searxng: SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled. - searxng/searxng

GitHub - searxng/searxng: SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

All the posts about Reddit blocking everyone except Google and Brave got me thinking: What if SearNGX was federated? I.E. when data is retrieved via a providers API, that data is then federated to all other instances.

It would spread the API load out amongst instances, removing the API bottlenecks that come from search providers.

It would allow for more anonymous search, since users could cycle between instances and get the same results.

Geographic bias would be a thing of the past.

Other than ActivityPub overhead and storage, which could be reduced by federating text-only content, I fail to see any downside.

Thoughts?

Hacker News @lemmy.smeargle.fans

SearXNG is a free internet metasearch engine

github.com /searxng/searxng

10 0

27 comments

I think you are not a computer programmer. Trying to build an index of the web by querying other search engines is not an efficient or sensible way to do things. Using ActivityPub for it is insane. Sharing query results in the obvious way might help a little during events where everyone searches for the same thing all at once, but in a relatively small pool of relatively sophisticated Internet users I don't think that happens often enough to justify the enormous amount of work and complexity.
On the other hand a distributed web crawler that puts its results in a free and decentralized database (one appropriate to the task; not blockchain) might be interesting. If the load on each node could be made light enough and the software simple enough that millions of people could run it at home, maybe it could be one way to build a new search engine. If that needs doing and someone has several hundred hours of free time to get it started.
- If you're looking for a distributed crawler and index:
  https://en.wikipedia.org/wiki/YaCy
  Yacy already exists and has been around for 2 decades.
  
  This is close to what I was thinking, but rather than crawling independently, leverage the API results from queries to build a list of sites (and then perhaps crawl). Potentialy a tag index of sorts. I'm not solid on any idea as I haven't investigated SearNGX enough to see how it works under the hood, but yes, on the same plane of thought.
  
  I really want to use this, but from what I read it basically requires a minimum of 20-30GB of RAM to be performant. Also the documentation appears to be a mess and highly outdated. I'd also want to cluster it internally and connect with outside peers still which seems possible, but with the large resource requirement not as feasible with my setup.
- Well, I am, including products in the Fediverse. And I never said federate the search queries.
  Trying to build an index of the web by querying other search engines is not an efficient or sensible way to do things.
  Never made this suggestion.
  On the other hand a distributed web crawler that puts its results in a free and decentralized database
  Now you're getting there.
  
  Okay, sorry! Still a long way to go before the idea becomes sufficiently well-specified to make much sense to me though. Perhaps an examination of yacy could provide you a concrete example of the ways in which such things are complicated. One would need to do much better to end up with a suitable replacement for the ways many of us use searx.
  It was wanting to use ActivityPub and the "I fail to see any downside" which led me to read the rest of your post in a way that might've been overly pessimistic about its merits.
- One of the things that can get annoying about searxng is that often search engines will rate limit if a lot of people are using one searxng instance. Maybe a “federated” approach would be, if results are rate limited -> send query to another trusted searx instance -> receive the results and send back to user. That way, people can stick to their favorite searxng instance without having to manually change their instance if the search engines were rate limiting.

One of the things that can get annoying about searxng is that often search engines will rate limit if a lot of people are using one searxng instance. Maybe a “federated” approach would be, if results are rate limited -> send query to another trusted searx instance -> receive the results and send back to user. That way, people can stick to their favorite searxng instance without having to manually change their instance if the search engines were rate limiting.
- I self host with yunohost it's a good way to not bog down the system.

I recall there is a federated search engine... somewhere. Anyone know what that was called.
- Are you thinking of YaCy?
  
  Ah, I wondered if something like that had been tried before. Looks like it is maybe still running: https://yacy.net/
  The demo isn't giving me useful search results.

Besides yacy, there is a project to build decentralized apps with search listed as an example. Its very early and nothing is built off of it yet though.
https://github.com/freenet/freenet-core

There was an NLNet project for the former SearX before it got discontinued

This is a great idea that will never work because it's too expensive to maintain.

So everyone stores a part of the search index? I think you've invented a machine-readable website index with extra steps.
- Hah, could be.

27 comments