Portal Home > Knowledgebase > Articles Database > Bot/Crawler Blocking Software
Bot/Crawler Blocking Software
Posted by warncke, 04-13-2010, 07:52 PM |
I am in the process of developing a Bot/Crawler blocking system, and I would be interested to get some feedback.
The program is designed for Apache/Mod_Perl/MySQL.
The blocking techniques that I am using are:
1) Request header profiling -- Hashing the request headers and user agent strings of all requests, and then running statistical analysis to see if the request header submitted matches a common type for that user agent. This cuts out low ball bots that fake the user agent string, but don't bother with the rest of the headers.
2) Browsing Pattern Analysis -- Running statistical analysis on the frequency of page accesses to look for bot like patterns
3) Referer tracking -- tracking requests and referers to make sure that they match
My goal here is to create blocking software that actually detects bots, as opposed to setting access limits. The intention is that a bot will get detected on the first request, or within the first 10 or so requests.
I am interested in general feedback on this subject, and I would like to find some server operators who would be interested in testing this with me.
|
Posted by Nortorious, 04-13-2010, 09:15 PM |
wouldnt it just be easier to do this http://************.org/forums/stopp...ots-t4826.r00t ? its not the same but probably would cause less i/o if you run a lot of sits.
|
Posted by warncke, 04-14-2010, 10:16 AM |
The honey pot technique is aimed at preventing a different kind of bot.
The primary focus of what I am doing is preventing content ripping and automation (spamming generally), which running a honey pot would not help with.
|
Add to Favourites Print this Article
Also Read