Back to vBulletin 4.x Add-ons

Ban Spiders by User Agent
Mod Version: 3.1.2, by Simon Lloyd

vB Version: 4.x.x Rating: (65 votes - 4.86 average) Installs: 492
Released: 09 Aug 2011 Last Update: 18 Dec 2014 Downloads: 2005
Supported Uses Plugins  

What this mod does
With this mod you can enter User Agents to watch or ban, you can also recieve emails or have an Output.txt created and updated with time and date of visits. It doesn't just have to be spiders, you can watch, log or ban any useragent!

How to install
Simply import the product ban_spider, the mod is active by default but none of the other options are turned on.

What is a UserAgent?
https://en.wikipedia.org/wiki/User_agent

Understanding a UserAgent string
http://user-agent-string.info/parse

Genuine User Getting Blocked?

Spoiler (click to open)


This mod can ONLY block those useragents that you have entered in the list, firstly get your user to go here http://whatsmyuseragent.com/ (via his phone) and find out what his useragent is then you go here http://www.botsvsbrowsers.com/SimulateUserAgent.asp and paste his UA string in and test it to see if you get denied or not.

Something in his useragent string is in your list so it's not the mods fault as it's banning what you ask it to

Close


Tools to help
http://whatsmyuseragent.com/SwitchingUserAgents.asp
http://www.botsvsbrowsers.com/SimulateUserAgent.asp

FAQ

Spoiler (click to open)


Quote by ForceHSS
going good no bugs in it so far that I can see
one thing would be nice to see as it seems to miss Thread Prefixes even if I make it forced to use them on a section it wont add them
It wont add prefixes as they are added when the forum loads, your actual url stays the same, a prefix is never added to them - have you ever seen a url like this http:http://www.mysite.com/showthread?t=[solved]12345 ???

Quote by ozzy47
If a spider is banned, how do I get them to crawl my site again, I tried your full ban list, and now my website monitor services are no longer checking my site.

I removed all spiders from admin except Baidu.
You added your site monitoring service as a bad bot? bad move!, remember we're sending them a 301 which is a permanent redirect, if you don't see them back in a week check with them, you may ask for your url to crawled again.

Quote by GreyGhost
Hi Simon, I just sent PM to test beta.

I have the released version installed on our vBCMS 4.1.7 but it doesn't seem to be banning Baidu. Our forums are located in the root with the CMS (so no /forums/), not sure if it's to do with this.

I have Track Guest Visits installed and it still shows 40-50 Baidu every day.

I've double checked my settings... only have "Ban Spiders In List" selected, no logging etc.

My List is:
Yandex
Yeti
Baidu
soso
sogou
ichiro
speedy
spinn3r
mlbot
psbot
SBIder
Ezooms
snap shots
metauri
YoudaoBot
youdao

Anyway, will try beta and see if that fixes it.

8-)

PS. I hope your daughter and grandson are doing well.
Right, firstly, thanks they're now doing great , your "Track Guest Visits" mod will ALWAYS show the spiders but your native vBulletin WOL will not, the reason why the TGV mod picks them up is because they are actually accessing your site (so that mods doing it's job and recording them) but my mod prevents them from having their request completed i.e direct request for a url is a forum access but they are redirected permanently before the thread loads (so my mod is ALSO doing its job )

Hope that clears things up for you all.

@GreyGhost i'll PM you details of the beta

Close


How does it work?

Spoiler (click to open)


Ok, i've checked and i dont see any of these bots in your native vbulletin WOL, the other mods you have for statistics and total visitors...etc WILL log these as visiting because the bots are directly accessing a url, the logging is done before the url loads completely, my mod also bans them at this point so both mods are working

Just as a note, you're using create a thread, you can quickly get thousands of threads, it's better to use the output.txt logging

Note to all!:
If you have Simon in your ban list this will ban the following:
simon
SimonLloyd
Lloyd simon
thisisanincrediblylongsimonwordhere

Get the idea?, you dont need to add all those to your ban list, simply because the mod looks for the string "simon" (case doesn't matter) in the entire string, so, if you'd used this in your list:
Simon*\Lloyd
It would NOT ban:
Simon
Simon Lloyd
thisissimonlloydinastring
but it WOULD ban
Simon*\Lloyd-in(this.string)
thisstringSimon*\Lloydhere
....etc

Hope you all understand this better now and can get to removing duplicates from your list.

@tricksodave, you can delete the temp account for me now thanks, also if you read the above please prune your list.

If any of you have any trouble with editing your lists let me know and i'll help with anything you're stuck with

Close


What's a bot?
https://en.wikipedia.org/wiki/Spambot

How do i ban a bot?

Spoiler (click to open)


Blocking spiders is all about personal choice, do a little research and find out whether you want to cater for that country and whether they add value to your site!, when Deepnet Explorer are visiting go to who's online and at the bottom there's a dropdown box for "Show Useragent?" select Yes, then check out their useragent, you can enter any or all of the UA string, so if they actually do have Deepnet in the UA then you just enter that on its own line in the list

Close

Spoiler (click to open)


as a side note you don't need the full useragent string anymore to ban them, you can now enter any part of the string:
e.g
bai will result in baidu being banned just as will any string containing "bai"
Entering Mozilla will result in every useragent string containing that to be banned.

So, entering the full bot name but not useragent string will do, enter Baidu for that spider, dont enter Ya as something to ban as Yahoo will be banned just as Yandex will.

Close


Where's output.txt located?

Spoiler (click to open)


The output.txt is generated as bots found in your list attempt to call a forum or thread, there's no time lag and the file should be created straight away. If you have no cms then the file should be available at http://www.mysite.com/output.txt if forum is in a folder then something like http://www.mysite.com/forum/output.txt

Any issues post back and i'll deal with them for you

Close


Bad bot lists

Spoiler (click to open)


Try these:
http://www.forumpostersunion.com/index.php?t=1644
http://www.vbseo.com/f34/how-create-vbulletin-bot-scraper-trap-47378/index4.html

But to be honest it's nothing that a little googling or binging wont reslove

Close

Spoiler (click to open)


try this list works well for me

Baidu
almaden
Anarchie
ASPSeek
attach
autoemailspider
BackWeb
Bandit
BatchFTP
BlackWidow
Bot\mailto:email
Buddy
bumblebee
CherryPicker
ChinaClaw
CICC
Collector
Copier
Copyscape
Crescent
DIIbot
DISCo
DISCo\Pump
dotbot
Download\Demon
Download\Wonder
Downloader
Drip
DSurf15a
eCatch
EasyDL/2.99
EirGrabber
email
EmailCollector
EmailSiphon
EmailWolf
Express\WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
FrontPage
GetRight
GetSmart
GetWeb!
gigabaz
Go\!Zilla
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
.*httrack.*
ia_archiver
Image\Stripper
Image\Sucker
Indy*Library
Indy\Library
InterGET
InternetLinkagent
Internet\Ninja
InternetSeer.com
Iria
JBH*agent
JetCar
JOC\Web\Spider
JustView
larbin
LeechFTP
LexiBot
lftp
Link*Sleuth
likse
//Link
LinkWalker
Mag-Net
Magnet
Mass\Downloader
Memo
Microsoft.URL
MIDown\tool
Mirror
Mister\PiX
Mozilla.*Indy
Mozilla.*NEWT
Mozilla*MSIECrawler
MS\FrontPage*
MSFrontPage
MSIECrawler
MSProxy
Navroad
NearSite
NetAnts
NetMechanic
NetSpider
Net\Vampire
NetZIP
NICErsPRO
Ninja
Nutch
Octopus
Offline\Explorer
Offline\Navigator
Openfind
PageGrabber
Papa\Foto
pavuk
pcBrowser
Ping
PingALink
Pockey
psbot
Pump
QRVA
RealDownload
Reaper
Recorder
ReGet
Scooter
Seeker
Siphon
sitecheck.internetseer.com
SiteSnagger
SlySearch
SmartDownload
Snake
sogou
Soso
SpaceBison
Spinn3r
sproose
Stripper
Sucker
SuperBot
SuperHTTP
Surfbot
Szukacz
tAkeOut
Teleport\Pro
URLSpiderPro
Vacuum
VoidEYE
vBSEO
Web\Image\Collector
Web\Sucker
WebAuto
[Ww]eb[Bb]andit
webcollage
WebCopier
Web\Downloader
WebEMailExtrac.*
WebFetch
WebGo\IS
WebHook
WebLeacher
WebMiner
WebMirror
WebReaper
WebSauger
Website
Website\eXtractor
Website\Quester
Webster
WebStripper
WebWhacker
WebZIP
Wget
Whacker
Widow
WWWOFFLE
x-Tractor
Xaldon\WebSpider
Xenu
Yandex
Yeti
YOUDAOBOT
Zeus.*Webster
Zeus

Close

Spoiler (click to open)


Quote by meaters
Awesome mod, thanks!

Saved our community from Baidu, hundreds of bots were online persistenly to the point of crashing our server.
And with only the addition of, per line:

MSIE 1
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6

You end 99.9% of all spam bot registration attempts and cut garbage traffic even further.

Here's my entire ban list for this Mod:

baiduspider
beta.statsit.com
statsit
SiteIntel
Yandex
GomezAgent
FunWebProducts
MSIE 1
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
w3m

Close


Tested on vb3.7.x, vB3.8.x , vB4.x.x but should work on any version.

____________________________________________________________________
Special thanks to:
Lior
KH99
BoP5
for helping me sort out a few issues

...and beta testers

ForceHSS (Special thanks to Force for latest testing)
ozzy47
GreyHost

If you use this please mark as INSTALLED

History
9th June 2011 Orginal xml added
12th June 2011 Added both email notification and text file logging
22nd June 2011 Version 2.0.0, Added create thread on activity
  1. Added match facility you can now use something like Yandex and it will match MOZILLA/5.0 (COMPATIBLE; YANDEXBOT/3.0; +HTTP://YANDEX.COM/BOTS)
  2. Added clickable link to visited thread
22nd September 2011 added user redirect url selection
08th October Beta testing started for thread creation.
20th October Beta testing started for emailing.
21st October Beta testing complete Ver 3.0.0 uploaded
29th October minor fix added to cope with empty userid on thread creation
30th October Beta testing automatic redirection to spiders/bots IP
31st October New xml uploaded with automatic redirect to IP
25th November Minor fix for blank forumid fixed
26th November 2011 Fixed version check & create thread Off by default
17th December 2014 Version 3.1.0 uploaded, Hook changed extra logging and statistics added by Ozzy47 (Chris)
18th December 2014 Version 3.1.1 uploaded, prevented spiders being counted when mod turned off.
17th December 2014 Version 3.1.2 uploaded, due to rogue code from another mod
The Bad Bots list is now included in the product
Please prune out all those that you wish to be able to see your site (i suggest you definately prune out "DA" and "Custo" :

Support will now only be given to those who have this mod marked as INSTALLED

Download

File Type: %1$s product-ban_spider4x.xml (30.8 KB, 485 downloads)

Supporters / CoAuthors

  • ozzy47

Similar Mods

Miscellaneous Hacks Ban Spiders by User Agent vBulletin 3.8 Add-ons

vblts.ru supports vBulletin®, 2022-2024