云题海 - 专业文章范例文档资料分享平台

当前位置:首页 > 网络爬虫外文翻译参考文献

网络爬虫外文翻译参考文献

  • 62 次阅读
  • 3 次下载
  • 2025/6/3 10:17:20

网络爬虫外文翻译参考文献

crawling process even if multithreading is used will be insufficient for large - scale engines that need to fetch large amounts of data rapidly.When a single centralized crawler is used all the fetched data passes through a single physical link.Distributing the crawling activity via multiple processes can help build a scalable, easily configurable system,which is fault tolerant system.Splitting the load decreases hardware requirements and at the same time increases the overall download speed and reliability. Each task is performed in a fully distributed fashion,that is ,no central coordinator exits.

Ⅵ.PROBLEM OF SELECTING MORE “INTERESTING”

A search engine is aware of hot topics because it collects user queries.The crawling process prioritizes URLs according to an importance metric such as similarity(to a driving query),back-link count,Page Rank or their combinations/variations.Recently Najork et al. Showed that breadth-first search collects high-quality pages first and suggested a variant of Page Rank.However,at the moment,search strategies are unable to exactly select the “best” paths because their knowledge is only partial.Due to the enormous amount of information available on the Internet a total-crawling is at the moment impossible,thus,prune strategies must be applied.Focused crawling and intelligent crawling,are techniques for discovering Web pages relevant to a specific topic or set of topics.

CONCLUSION

In this paper we conclude that complete web crawling coverage cannot be achieved, due to the vast size of the whole WWW and to resource availability.Usually a kind of threshold is set up(number of visited URLs, level in the website tree,compliance with a topic,etc.)to limit the crawling process over a selected website.This information is available in search engines to store/refresh most relevant and updated web pages,thus improving quality of retrieved contents while reducing stale content and missing pages.

网络爬虫外文翻译参考文献

谢谢下载,

祝您生活愉快!

搜索更多关于: 网络爬虫外文翻译参考文献 的文档
  • 收藏
  • 违规举报
  • 版权认领
下载文档10.00 元 加入VIP免费下载
推荐下载
本文作者:...

共分享92篇相关文档

文档简介:

网络爬虫外文翻译参考文献 crawling process even if multithreading is used will be insufficient for large - scale engines that need to fetch large amounts of data rapidly.When a single centralized crawler is used all the fetched data passes through a single physical link.Distributing the crawling activity via multiple processes can help build a scalable, easily configurable system,which is fault toler

× 游客快捷下载通道(下载后可以自由复制和排版)
单篇付费下载
限时特价:10 元/份 原价:20元
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219
Copyright © 云题海 All Rights Reserved. 苏ICP备16052595号-3 网站地图 客服QQ:370150219 邮箱:370150219@qq.com