0219 HTTP_USER_AGENT Indy Nutch Jakarta
看到一些奇怪的 HTTP_USER_AGENT
NutchCVS/0.7.1
Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
If you\’re reading this, chances are you\’ve seen a Nutch-based robot visiting your site while looking through your server logs. Our software obeys robots.txt files and robot META tags in HTML. These are the standard mechanisms for webmasters to tell web robots which portions of a site a robot is welcome to access.
Sysadmins/robots.txt
We\’re a software project, not a service, so please understand that a misbehaving crawler appearing with our Agent string is not run by us. Our software may be run by anyone. However, we\’d still like to hear about any bad behavior. If possible, please include the name of the domain and some representative log entries. We can be reached at nutch-agent@lucene.apache.org.
Our software obeys the robots.txt exclusion standard, described at http://www.robotstxt.org/wc/exclusion.html#robotstxt. Different installations of the Nutch software may specify different agent names, but all should respond to the agent name "Nutch". Thus to ban all Nutch-based crawlers from your site, place the following in your robots.txt file:
User-agent: NutchDisallow: /
compatible; Indy Library
这个很多,可是说法不一
Robot Name: Indy Library
Version:
Agent_String: Mozilla/3.0 (compatible; Indy Library)
Comments: Update request came from 81.62.187.162 at 2005-02-01 12:33 GMT
URL: http://www.indyproject.org/
E-mail:
First Visit: 2002-07-03 14:19:41+10
Last Visit: 2006-02-18 23:36:29+11
Hits: 27 (This month)
IP Addr: 172.200.108.61 193.36.230.96 217.219.132.222 220.137.107.66 220.137.110.4 220.137.88.136 220.137.89.138 220.137.89.204 220.137.90.79 220.137.91.249 220.137.92.199 220.137.92.245 220.137.92.43 220.137.94.118 24.210.239.46 61.216.136.45
The Indy Project
The Indy Project is an Open Source group that maintains several active projects that grew out of the original Indy.Sockets project.
Indy.Sockets
Indy.Sockets is an open source socket library that supports clients, servers, TCP, UDP, raw sockets, as well as over 100 higher level protocols such as SMTP, POP3, NNTP, HTTP, and many more. Indy.Sockets is available for C#, C++, Delphi, Visual Basic.NET, any .NET language, and Kylix. Indy.Sockets FCL build is a managed assembly and is compatible with the Microsoft .NET framework, and Mono.
SetEnvIf User-Agent ^Mozilla.*Indy keep_out
order allow,deny
allow from all
deny from env=keep_out
0&line;&line;1140336115&line;&line;Mediapartners-Google/2.1
2. 为了展示相关性最强的广告,如何优化我的网站?
贵网站的内容和结构决定了我们针对贵网站投放广告的能力。下面是一些优化网站的基本原则:
将广告置于主要内容都是文字的网页上,因为网页的上下文只由文字内容确定。
如果您使用了 robots.txt 文件,则需要将其删除,或者向此文件添加以下两行语句,这样,我们的内容漫游器才可以抓取您的网站:
User-agent:Mediapartners-Google*
Disallow:
Jakarta Commons-HttpClient
Jakarta Commons HttpClient
SECTION 01 HttpClient 总览
Hyper-Text Transfer Protocol (HTTP) 是现在网络上最通行的通讯协议, 随著 Webservice 的技术快速发展, 也让许多的应用程序结合 HTTP 扩展更大的功能
SECTION 02 特色
是因为他的特色….. 有许多是 java.net.* 没有提供的, 或者自己要实现的. 不如就直接采用 commons-httpclient.
使用 Pure Java 开发标准的 HTTP v1.0 及 v1.1
实现所有的 HTTP methods (GET, POST, PUT, DELETE,HEAD, OPTIONS, and TRACE)
支持 HTTPS 的协议
支持 proxies 的各种情况
利用 Multi Form 上传文件
支持认证机制
可以设置最大连结数量
自动的 cookie 处理模式
Request 及 Response 最佳化处理
支持 HTTP 1.0 KeepAlive 联机模式及 1.1 的 persistance 保存状态
直接存取服务器送来的 response code 及 header
能够设置连结超时
实现 Command Pattern 允许去平行处理及有效重复使用连结.
这个是开放源码的
P.Arthur
哈哈,今天正好看到,论坛在线20人,少有的情况呀,结果好多位访客的操作系统一栏竟然是P.arthur搜索器,吓了我一跳。经仔细查证,原来是北大天网的搜索引擎来访,哈哈
FunWebProducts)
FunWebProduct的MyWebSearch
MyWebSearch是一系列的工具組合而成的,包括了一些滑鼠指標設定的iCursor以及攔截視窗的功能等。這個工具通常是由於某些網站在進入的時候設定自動幫你安裝,所以造成許多人安裝了這類惡質軟體而不自覺。並且移除的手續容易讓人誤導,讓人以為已經移除了而事實上還存在電腦中。
今天查看来访者的User Agent,发现有的朋友的浏览器安装了名为FunWebProducts的插件。 搜索资料后发现:
FunWebProducts 是一个包含面向广告的间谍软件(AdWare)的打包程序,它有或没有经过你的许可自动下载到你系统。
FunWebProducts 自已并不是间谍软件或adWare,但它可能会安装其他的spyware或adware.绑定的spyware或adware能够收集网页浏览会话(Session)信息并送回公司的服务和下载和显示新的广告在一个弹出窗口,当你浏览这个网站时
历史博文
- 20081005 c# pipe - 2009
- 20070903 linux kvm java kaffe waba - 2008
- 1230 CFtpConnection 超时 CInternetSession - 2007
- Windows下的CVS版本控制软件使用-序言 - 2005