Peer-2-peer free software web search (YaCy) goes 1.0

5 replies [Last post]
lembas
Offline
Joined: 05/13/2010

Here's some info

http://fsfe.org/news/2011/news-20111128-01.html

Here's an old pdf presentation

http://yacy.net/material/YaCy_FSCONS_2010.pdf

It's looking cool but not even present in Debian yet. Perhaps one day it could be part of Trisquel's default install. Or what do you guys think?

BinaryDigit
Offline
Joined: 11/30/2010

I think it's a great idea. I can imagine other applications as well, like peer-to-peer web site recommendations and the old "web of trust" (WOT) idea, used to flag safe/unsafe sites.

A question rarely asked is why computer users have to search in the first place. Searching means I have to type a search term, and then validate, filter and sort the results, the computer should be doing that. We need to move beyond just search.

ivaylo
Offline
Joined: 07/26/2010

В 18:06 +0100 на 29.11.2011 (вт), mikko.viinamaki[@nospam] написа:

> It's looking cool but not even present in Debian yet. Perhaps one day it
> could be part of Trisquel's default install. Or what do you guys think?

The download section at the YaCy site points to a wiki [1] with
information about a repository (debian.yacy.net) with deb packages. You
can use it with Trisquel. Might not work though.

I've tried it once and I think I've used the deb repositories YaCy
provides. [1] I found the web interface too slow and with too much
inaccurate search results. I've been telling myself that I should test
it again, but haven't yet.

[1] http://www.yacy-websuche.de/wiki/index.php/En:DebianInstall

t3g
t3g
Offline
Joined: 05/15/2011

I'm starting to get deja-vu when reading this because I think someone may or may not have tried this like 5+ years ago with a P2P search engine. I'm assuming that it is a tech like BitTorrent that stores a "seed" on your hard drive and shares it with others and grows as you add more data to it.

To be honest though, this would be much better if it didn't rely on an install and was purely in the web browser. I say this so users of any platform can take advantage of it as long as they have a modern web browser (doesn't IE8 support local storage?) and not install anything. Of course if they clear their cookies and/or local storage, then the data is gone unless it becomes centralized syncing with a server.

EDIT: I just tried the web search portal (which uses the program's seeds) at http://search.yacy.net/ and the results are terrible. Everything I threw at it were popular terms (Jesus, Lady Gaga, cat) and got a bunch of random nonsense or 0 like with the search term cat

ivaylo
Offline
Joined: 07/26/2010

В 21:51 +0100 на 29.11.2011 (вт), tegskywalker[@nospam] написа:
> I'm starting to get deja-vu when reading this because I think someone may or
> may not have tried this like 5+ years ago with the P2P search engine.

Wikia Search used a peer-to-peer crawler, called Grub. [1] That is all
that I can remember.

>
> To be honest though, this would be much better if it didn't rely on an
> install and was purely in the web browser.

Actually it wouldn't be. The browser is already a bloatware. It uses too
much memory and sometimes CPU. There are still rare implementations
(Midori, Epiphany; text based tings like w3m, links, lynx ...) that are
not so greedy. Firefox and forks are the worst. Above 400MB, above 1GB.
Come on! Firefox releases after version 4 even pollute memory faster
than I can pronounce "random access memory".

The tasks of crawling, indexing etc. etc. just do not fit in the
browser.

> I say this so users of any platform can take advantage of it

YaCy has versions for proprietary operating systems.

> and not install anything.

When the user has a real operating system, installing is not that big of
a deal. In GNU/Linux it is usually few words on the command line or few
clicks in GUI. I have almost no experience with *BSDs, but installation
there is also relatively easy. If that is hard, I don't know what easy
is.

> Of course if they clear their cookies and/or local storage, then the data is gone unless
> it becomes centralized syncing with a server.

If we assume that everybody on the p2p search network has access to all
the crawled data, than theoretically the data should be on the network
(everybody), easily fetched.

P.S. The one thing I don't like about YaCy is it is written in Java. I
think this makes software slow. But if it becomes popular I bet someone
will write C or at least C++ libraries, crawlers, indexers, clients etc.
etc. that use the protocol.

[1] https://en.wikipedia.org/wiki/Grub_%28search_engine%29

Michał Masłowski

I am a member!

I am a translator!

Offline
Joined: 05/15/2010

> The tasks of crawling, indexing etc. etc. just do not fit in the
> browser.

It's also useful to do these tasks when not running the browser, or from
machines where browsers aren't useful (e.g. servers with better network
connections).

> P.S. The one thing I don't like about YaCy is it is written in Java. I
> think this makes software slow. But if it becomes popular I bet someone
> will write C or at least C++ libraries, crawlers, indexers, clients etc.
> etc. that use the protocol.

Java is the slowest way to write a true(1)-compatible program known to
me, it's probably similar for typical quick programs for use in shell
scripts; it's faster for long-running programs if using a supported
architecture (not my MIPS64-like laptop). For me the problems with Java
are that it makes source longer, easier to make some bugs, and more
difficult to use some nice ways of designing programs than e.g. Python
or Haskell, it also uses more memory than C, requires huge JDK package,
and programs using it are distributed in a way unfriendly to GNU/Linux
distros (e.g. with dependencies bundled). I don't know if any of these
arguments apply to this case.