Anonymous Web Searching (& Decentralized Search Engines)

December 15, 2007 by sharky

Faroo
ixquick
Majestic-12
YaCy

Most existing search engines use a ‘centralized’ style of architecture combined with ranking algorithms to generate the ranking of documents crawled in their databases. The best example of this would be the Google PageRank system. However well they perform at generic web page queries, popular search engines are not conducive to finding illegal content on the Internet.

Search engines also log your ’search queries’ including the keywords, IP address and other data of the search, even if you didn’t provide any information. This info is kept in their databases, and there’s no telling how long this info is kept, or what is done with it. Giants like Microsoft (MSN) and Yahoo keep their retention policies private, and Google’s is scrupulous at best.

Want to use the Google Search anonymously? Try Scroogle! They offer free anonymous Google searching through their site. Check it out here.

So, what’s the worst that can happen, really?

Well, your search queries could be logged and published publicly on the Internet, for one. And then by applying some not-so sophisticated reverse DNS check on it, eventually the IP address can be traced back to the exact account holder. A remote possibility? Meet www.aolstalker.com:

In August 2006, AOL had released the search logs done by 500,000 of their users over a three months period. If you are one of the randomly chosen users, everything you searched for from March 2006 to May 2006 is now public information on the Internet.

Check out the links below to start browsing some past search histories! Whee!

http://www.aolstalker.com - This is a fun site for the AOL lists!

http://aolpsycho.com/ - This site is funny, too!

http://www.gregsadetsky.com/aol-data/ - The entire list can be downloaded from here.

Common solutions:

A ‘decentralized search engine’ is one that’s completely reliant to the content supplied by others - or peers - whether it be actual files or indexed web pages/links. Thus, search results are completely uncensored and unfiltered, which is advantageous and ominous at the same time: Some see this as desirable in regards to the variety of documents returned without the Google or Yahoo! influences (or lack thereof). Others see this as an open door to all types of spam and ‘free’ advertising, since it’s unmonitored and user-oriented. We feel that even a decentralized search engine will require some form of moderation.

An anonymous decentralized search engine promotes user security and can provide applicable search results. The concept itself isn’t a new one - this idea has been tossed around for years. Here are a couple of examples of how this emerging technology is being applied today.

Faroo (Peer-to-peer Web Search)

FREEWARE

What is it?Faroo is a program that implements anonymous and encrypted web searching based on peer-to-peer technology, rather than relying on a central-server approach used by conventional search engines. In short, Faroo is a decentralized peer-to-peer search tool that is reliant upon search information/feedback from other users, thus cutting out the cliquey ‘corporate politics’ invoked by other search engines.

What isn’t it?Although Faroo is not a P2P file sharing application, specific searches can be more refined, and it offers privacy protection through encrypted search queries.

Installation:Not so simple! Right now Faroo is in beta testing (as of Dec. 10/07), so you must visit their website and email them a request to become a beta tester. They’ll send you an auto-response email “As soon as we extend our beta test we will invite you” so you’ll have to wait for an invitation. Even us at FileShareFreak didn’t have the clout to jump the queue on this, so we’re waiting as well.

Beta Update:OK, maybe we DO have some clout after all (or maybe just lucky). They emailed us back the next day with the necessary beta testing information. Filesharefreak has 10 free “Beta Test Invites” available for the first 10 people that email us and request them. Visit our contact page.

Our Notes:We like Faroo because we are privy to privacy! Faroo offers encrypted searches (and the results) - however, downloading files through it is not. IP blocking software need apply.

The Faroo P2P Search application runs in the background, and the program interface (search) is through your web browser, so it couldn’t be easier. Faroo’s search queries a little thin right now, since it’s using “indexed” pages from contributors from its Alpha version, but content shouldn’t be a problem soon.

Click to see the Faroo search window

ixquick.com (Private Web Search)

FREEWARE

ixquick.com is a search engine that protects your privacy by deleting cookies and log files sent to it during traditional search queries. Additionally, all other data kept by ixquick is deleted within 48 hours, including IP addresses. Since ixquick is a metasearch engine, search results are comparible to any major search engine. (In layman’s terms, a ‘metasearch engine’ relies on other search engines to produce the search results for them, like Google, MSN etc.).

Read the ixquick statement on protecting privacy for more information.

Our Notes: While the searching showed good returns and speed, this is not a service for the truly paranoid as a standalone solution. How does one truly ascertain that this data is being deleted after 48 hours? Thus, it’s probably best to use ixquick over an anonymous web proxy, as well.

Majestic-12 (Distributed Web Search)

FREEWARE

Majestic-12 is developing a search engine scalable to billions of web pages that is based on support by the community. To quote them:

There are millions web sites out there, with billions of pages and so far only a handful of huge companies were able to create a search engine that can provide relevant information to the users. Big companies control the entry point to the data you seek, and neither you nor web masters who run the sites have a say in the matter.

Our Notes: Start searching their indexed sites at the Majestic-12 homepage - the search box is in the top-right corner. Also, you can support the project by becoming a node in the data crawl.

YaCy (Distributed Web Search) v0.55

FREEWARE

What is it?YaCy is a personal peer-to-peer web crawler and web search engine. Because it uses a decentralized architecture, Web crawls (searches) are collaborative with all other YaCy peers. Since a ‘central server’ for searching is not implemented, searches are anonymous and uncensored.

What isn’t it?YaCy is not technically a P2P file sharing application. But it can be used similarly - a search result shows text, image, audio and video content with direct links to MP3 and video files.

Installation:

Opening TAR files with WinRAR - click to enlarge1. Download the *.tar file from their homepage (the file name is similar to this - “yacy_pro_v0.55_20071004_4145.tar”), and extract all files to your hard drive (WinRAR is a recommended program to open *.tar files).

2. Next, run “startYACY.bat” - this should launch the DOS-style command shell (see below) and your web browser a short time after.

Click to enlargeA look at the YaCy 'DOS' command shell - click to enlarge

3. In the browser window that opens, you’ll need to enter a username, password and Peer Name to configure yourself as a peer. After, click the “Set Configuration” button at the bottom of the page.

The YaCy configuration webpage

4. Choose one of the 4 options that are now listed under the “Set Configuration” button. You’ll now be asked to login with your username and password in the popup window (see screenshots below). Finally, you should now be logged in to YaCy in the browser, and able to perform searches and crawls.

Properly configured 'YaCy' LOGIN settingsThe YaCy LOGIN popup windowSuccessfully logged in to YaCy

(Note: The configuration part only has to be done once - the next time you use YaCy (by running ’startYACY.bat’) you’ll only have to enter your username and password in the popup window.)

Our Notes on YaCy:We had some issues due to the amount of search queries - they were few and far between in relation to multimedia files. Other searching (text) were relatively good by comparison.

Advanced: It is conceivably possible to run a decentralized search through anonymous proxy software (such as ‘Tor’) or even a proxy server (anonymous web-based or otherwise). This worked for both Faroo and YaCy - although this may be overkill since the IP address for searching remains the same: 127.0.0.1.

Private Internet / Darknets »