<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Google MD5 Hash Search Engine</title>
	<atom:link href="http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/</link>
	<description>Security Tools and Tips</description>
	<lastBuildDate>Fri, 18 Nov 2011 18:51:25 -0600</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: Onur Safak</title>
		<link>http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/comment-page-1/#comment-58303</link>
		<dc:creator>Onur Safak</dc:creator>
		<pubDate>Mon, 14 Jul 2008 23:25:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/#comment-58303</guid>
		<description>but how will you binefit from storing them into google

It helps a lot because even if the site goes offline, google cache will still have the hash

Actually, the benefit is the storage and the ability to search hundreds of gigabytes, even terabytes of result data from the index.

And I don&#039;t like the idea to disturb one of the best internet service with a great team behind.</description>
		<content:encoded><![CDATA[<p>but how will you binefit from storing them into google</p>
<p>It helps a lot because even if the site goes offline, google cache will still have the hash</p>
<p>Actually, the benefit is the storage and the ability to search hundreds of gigabytes, even terabytes of result data from the index.</p>
<p>And I don&#8217;t like the idea to disturb one of the best internet service with a great team behind.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dragos Lungu</title>
		<link>http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/comment-page-1/#comment-5495</link>
		<dc:creator>Dragos Lungu</dc:creator>
		<pubDate>Wed, 17 Oct 2007 18:41:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/#comment-5495</guid>
		<description>@photoshop 
It helps a lot because even if the site goes offline, google cache will still have the hash</description>
		<content:encoded><![CDATA[<p>@photoshop<br />
It helps a lot because even if the site goes offline, google cache will still have the hash</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: photoshop</title>
		<link>http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/comment-page-1/#comment-5466</link>
		<dc:creator>photoshop</dc:creator>
		<pubDate>Wed, 17 Oct 2007 04:00:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/#comment-5466</guid>
		<description>but how will you binefit from storing them into google</description>
		<content:encoded><![CDATA[<p>but how will you binefit from storing them into google</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Brown</title>
		<link>http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/comment-page-1/#comment-292</link>
		<dc:creator>Tim Brown</dc:creator>
		<pubDate>Sun, 24 Jun 2007 12:18:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/#comment-292</guid>
		<description>No not at all, everyone sees the same content.  The point here is that Google doesn&#039;t always cache the body of the web page, but will always cache the title.  Hence, when we present results for any given word, we present them in both areas.  The ideas is that should we ever feel like closing the PoC, Google will continue to remember :).</description>
		<content:encoded><![CDATA[<p>No not at all, everyone sees the same content.  The point here is that Google doesn&#8217;t always cache the body of the web page, but will always cache the title.  Hence, when we present results for any given word, we present them in both areas.  The ideas is that should we ever feel like closing the PoC, Google will continue to remember <img src='http://www.dragoslungu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben</title>
		<link>http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/comment-page-1/#comment-290</link>
		<dc:creator>Ben</dc:creator>
		<pubDate>Sun, 24 Jun 2007 04:01:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.dragoslungu.com/2007/06/22/google-md5-hash-search-engine/#comment-290</guid>
		<description>It&#039;s a rather interesting concept: leaving storage and indexing up to google, instead of having to handle it all yourself.  So all you do is generate the hashes as google requests the page.  The only problem with it is, I don&#039;t know:
1) how far Google is willing to crawl: a search for &quot;site:www.nth-dimension.org.uk&quot; reveals some 7K results for the domain name.
2) Poking around in the URL, it looks like they only have about 2400 words (change the startline value)
3) There&#039;s one that is more methodical: http://reverse.me.uk/, but a search for that domain reveals a measily 49 results... so somehow Google will stop early if it all looks the same?  I&#039;m not sure.
4) I&#039;ve tried to make my own, just for the fun of it.  My blog is searchable by google, so hopefully there&#039;ll be more than 49 results... we&#039;ll see?  Here&#039;s the hash generator: http://darwin.servehttp.com/cgi-bin/hash.pl</description>
		<content:encoded><![CDATA[<p>It&#8217;s a rather interesting concept: leaving storage and indexing up to google, instead of having to handle it all yourself.  So all you do is generate the hashes as google requests the page.  The only problem with it is, I don&#8217;t know:<br />
1) how far Google is willing to crawl: a search for &#8220;site:www.nth-dimension.org.uk&#8221; reveals some 7K results for the domain name.<br />
2) Poking around in the URL, it looks like they only have about 2400 words (change the startline value)<br />
3) There&#8217;s one that is more methodical: <a href="http://reverse.me.uk/" rel="nofollow">http://reverse.me.uk/</a>, but a search for that domain reveals a measily 49 results&#8230; so somehow Google will stop early if it all looks the same?  I&#8217;m not sure.<br />
4) I&#8217;ve tried to make my own, just for the fun of it.  My blog is searchable by google, so hopefully there&#8217;ll be more than 49 results&#8230; we&#8217;ll see?  Here&#8217;s the hash generator: <a href="http://darwin.servehttp.com/cgi-bin/hash.pl" rel="nofollow">http://darwin.servehttp.com/cgi-bin/hash.pl</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

