Posts Tagged ‘crawler’

Google Digging Deep into “Invisible Web” to Expand Search Results

May 1st, 2008 by Vipul

Did anyne know that we are unable to access billions of pages of information through normal web searches? Now, in a constant bid to improve coverage of the web, Google has started experimenting with a new technology recently to dig deeper into this so-called Invisible Web.

Google is doing something different as always. It has started exploring HTML forms from high quality site and use it to crawl the URLs that correspond to the query. Sounds interesting!

This move is significant as this comprises a huge part of the web that has never been explored by search engines so far (assuming all the search engines follow this). Could this move mean a lot more information being added to the already crowded web? Now, probably users need to do some more exercise to find the desired results or user might see lots of good and quality results. Who knows!

According to offical Google webmaster blog this move doesn’t impact PageRank for other pages of the same site rather it shall only incrase the exposure of a site to Google. This change also does not affect the crawling, ranking, or selection of other web pages in any significant way.

Share/Save/Bookmark

Sphere: Related Content

Yahoo! Slurp 3.0 is Coming

May 1st, 2008 by Vipul

Yahoo! is preparing to release the latest version of the Yahoo! Search crawler with some infrastructure updates, which recently caused a variance in their crawl behavior.

With everything now in place, the rollout has officially begun. The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for ‘Yahoo! Slurp,’ though it’ll identify itself as Slurp 3.0 in web logs.

According to official Yahoo! blog you the can expect the following changes over next few weeks:

  • The crawlers will start crawling from a different and much smaller set of IP addresses, but it’ll still be from the crawl.yahoo.net domain. Any reverse DNS checks to identify our crawler will continue to work. Please note that if you’re using IP-based recognition of our crawlers, you might see a drop in crawl/coverage from Yahoo! We strongly recommend that you move to reverse DNS-based identification of Yahoo! Slurp if you’re using any other method to avoid this problem. The current set of IPs will disappear from your web logs in the next several weeks.
  • The crawlers will also publish a new user-agent, ‘Yahoo! Slurp/3.0.’ Existing robots.txt directives for ‘Slurp’ or ‘Yahoo! Slurp’ will continue to work, but if you have directives specific to ‘Slurp/2.0,’ they won’t be recognized by the new crawler (though usage of the ‘Slurp/2.0′ user-agent is very rare on the web, so you won’t likely be affected). We recommend specifying the shorter version of: User-agent: Slurp.

These changes will affect the main Yahoo! Web Search crawlers. Crawlers that similarly respect the Yahoo! Slurp directive but identify themselves more specifically, such as Yahoo! Slurp China and others, will not be impacted.

Share/Save/Bookmark

Sphere: Related Content