.htaccess check |
.htaccess check |
Xabache |
Sep 22 2011, 11:37 AM
Post
#1
|
Group: Members Posts: 5 Joined: 17-September 11 Member No.: 15,419 |
An .htaccess file i've been working on, mostly taken from "the perfect .htaccess file" page. I added everything down to the error redirects. take a look, any duplicate functions, anything i shouldn't be doing?
My primary concern is keeping everyone out of the site, preventing site downloads, however i think it might be to tight, will google, bing and yahoo still be able to index me? how can i allow all the big search engines to index me while keeping everyone else away? I would also like to beable to deny certain browsers beyond the index page, or more preferably deny all browsers except the ones i choose, with all others being redirected back to the index page, is this possible? thanks for help CODE <Files .htaccess> order allow,deny deny from all </Files> IndexIgnore * # disable directory browsing Options All -Indexes AddType application/octet-stream .mobi AddType application/octet-stream .pdf AddType application/octet-stream .epub AddType application/octet-stream .zip ## .htaccess Code :: BEGIN ## Block Bad Bots by user-Agent SetEnvIfNoCase user-Agent ^FrontPage [NC,OR] SetEnvIfNoCase user-Agent ^Java.* [NC,OR] SetEnvIfNoCase user-Agent ^Microsoft.URL [NC,OR] SetEnvIfNoCase user-Agent ^MSFrontPage [NC,OR] SetEnvIfNoCase user-Agent ^Offline.Explorer [NC,OR] SetEnvIfNoCase user-Agent ^[Ww]eb[Bb]andit [NC,OR] SetEnvIfNoCase user-Agent ^Zeus [NC] <Limit GET POST HEAD> Order Allow,Deny Allow from all Deny from env=bad_bot </Limit> ## .htaccess Code :: END Options +FollowSymlinks # Protect Hotlinking RewriteEngine On RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www.)?domainname.com/ [nc] RewriteRule .*.(gif|jpg|png)$ http://domainname.com/img/hotlink_f_o.png [nc] ErrorDocument 400 /index.html ErrorDocument 401 /index.html ErrorDocument 403 /index.html ErrorDocument 404 /index.html ErrorDocument 500 /index.html Options +FollowSymLinks RewriteEngine on RewriteBase / # this ruleset is to "stop" stupid attempts to use MS IIS expolits on us # NIMDA RewriteCond %{REQUEST_URI} /(cmd¦root¦shell)\.exe$[NC,OR] RewriteCond %{REQUEST_URI} /(admin¦httpodbc)\.dll$[NC] RewriteRule .* /cgi-bin/nonimda.cmd [L,E=HTTP_USER_AGENT:NIMDA_EXPLOIT,T=application/x-httpd-cgi] # CODERED RewriteCond %{REQUEST_URI} /default\.(ida¦idq)$[NC,OR] RewriteCond %{REQUEST_URI} /.*\.printer$[NC] RewriteRule .* /cgi-bin/nocode-r.cmd [L,E=HTTP_USER_AGENT:CODERED_EXPLOIT,T=application/x-httpd-cgi] # this ruleset is for formmail script abusers... RewriteCond %{REQUEST_URI} formmail\.(pl¦cgi)$[NC,OR] RewriteCond %{REQUEST_URI} mailto\.(exe¦cgi)$[NC] RewriteRule .* /cgi-bin/nofrmml.cmd [L,E=HTTP_USER_AGENT:FORMMAIL_EXPLOIT,T=application/x-httpd-cgi] # Cyveillance is a spybot that scours the web for copyright violations and “damaging information” on # behalf of clients such as the RIAA and MPAA. Their robot spoofs its User-Agent to look like Internet # Explorer, and it completely ignores robots.txt. I have # banned it by IP address. RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$" RewriteRule .* - [F] # There is another email harvester which always claims to be referred from http://www.iaea.org/. # You may have seen this in your own referrer pages. # I have banned it by referrer. RewriteCond %{HTTP_REFERER} iaea\.org[NC] RewriteRule .* - [F] # NameProtect peddles their “online brand monitoring” to unsuspecting and gullible companies # looking for people to sue. Despite the claims on their robot information page, they do not # respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection. # I have banned them by User-Agent and IP address. RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR] RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR] RewriteCond %{HTTP_USER_AGENT} NPBot[NC] RewriteRule .* - [F] # this ruleset is for unwanted useragents... possibly email harvesters RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$[NC,OR] RewriteCond %{HTTP_USER_AGENT} ^.Browse\s[NC,OR] RewriteCond %{HTTP_USER_AGENT} ^.Eval[NC,OR] RewriteCond %{HTTP_USER_AGENT} ^.Surf [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^.*Harvest [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack [NC,OR] # RewriteCond %{HTTP_USER_AGENT} ^.*libwww-perl [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^.*LWP [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^.*prospector[NC,OR] RewriteCond %{HTTP_USER_AGENT} AsiaNetBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ASSORT [NC,OR] RewriteCond %{HTTP_USER_AGENT} attache [NC,OR] RewriteCond %{HTTP_USER_AGENT} ATHENS [NC,OR] RewriteCond %{HTTP_USER_AGENT} autohttp [NC,OR] RewriteCond %{HTTP_USER_AGENT} bew [NC,OR] RewriteCond %{HTTP_USER_AGENT} BlackWidow [NC,OR] RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:craftbot@yahoo.com [NC,OR] RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR] RewriteCond %{HTTP_USER_AGENT} CherryPicker [NC,OR] RewriteCond %{HTTP_USER_AGENT} ChinaClaw[NC,OR] RewriteCond %{HTTP_USER_AGENT} Crescent [NC,OR] RewriteCond %{HTTP_USER_AGENT} curl [NC,OR] RewriteCond %{HTTP_USER_AGENT} devsoft's\ http\ component [NC,OR] RewriteCond %{HTTP_USER_AGENT} Deweb[NC,OR] RewriteCond %{HTTP_USER_AGENT} Digimarc [NC,OR] RewriteCond %{HTTP_USER_AGENT} Digger [NC,OR] RewriteCond %{HTTP_USER_AGENT} digout4uagent[NC,OR] RewriteCond %{HTTP_USER_AGENT} DIIbot [NC,OR] RewriteCond %{HTTP_USER_AGENT} DISCo[NC,OR] RewriteCond %{HTTP_USER_AGENT} dloader(NaverRobot) [NC,OR] RewriteCond %{HTTP_USER_AGENT} Download\ Demon [NC,OR] RewriteCond %{HTTP_USER_AGENT} eCatch [NC,OR] RewriteCond %{HTTP_USER_AGENT} ecollector [NC,OR] RewriteCond %{HTTP_USER_AGENT} Educate\ Search [NC,OR] RewriteCond %{HTTP_USER_AGENT} EirGrabber [NC,OR] RewriteCond %{HTTP_USER_AGENT} EmailCollector [NC,OR] RewriteCond %{HTTP_USER_AGENT} EmailSiphon [NC,OR] RewriteCond %{HTTP_USER_AGENT} EmailWolf[NC,OR] RewriteCond %{HTTP_USER_AGENT} EO\ Browse [NC,OR] RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures[NC,OR] RewriteCond %{HTTP_USER_AGENT} ExtractorPro [NC,OR] RewriteCond %{HTTP_USER_AGENT} EyeNetIE [NC,OR] RewriteCond %{HTTP_USER_AGENT} fastlwspider [NC,OR] RewriteCond %{HTTP_USER_AGENT} FEZhead[NC,OR] RewriteCond %{HTTP_USER_AGENT} Fetch[NC,OR] RewriteCond %{HTTP_USER_AGENT} FlashGet [NC,OR] RewriteCond %{HTTP_USER_AGENT} Franklin\ Locator[NC,OR] RewriteCond %{HTTP_USER_AGENT} Full\ Web\ Bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} Getleft [NC,OR] RewriteCond %{HTTP_USER_AGENT} GetRight [NC,OR] RewriteCond %{HTTP_USER_AGENT} GetURL [NC,OR] RewriteCond %{HTTP_USER_AGENT} GetWebPage [NC,OR] RewriteCond %{HTTP_USER_AGENT} Go!Zilla [NC,OR] RewriteCond %{HTTP_USER_AGENT} Gozilla [NC,OR] RewriteCond %{HTTP_USER_AGENT} go-ahead-got-it [NC,OR] RewriteCond %{HTTP_USER_AGENT} GrabNet [NC,OR] RewriteCond %{HTTP_USER_AGENT} Grafula [NC,OR] RewriteCond %{HTTP_USER_AGENT} HMView [NC,OR] RewriteCond %{HTTP_USER_AGENT} HTML\ Works [NC,OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] # RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR] RewriteCond %{HTTP_USER_AGENT} IBM_Planetwide [NC,OR] RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [NC,OR] RewriteCond %{HTTP_USER_AGENT} Image\ Sucker[NC,OR] RewriteCond %{HTTP_USER_AGENT} IncyWincy[NC,OR] RewriteCond %{HTTP_USER_AGENT} Industry\ Program[NC,OR] RewriteCond %{HTTP_USER_AGENT} InterGET [NC,OR] RewriteCond %{HTTP_USER_AGENT} Internet\ Explore\ 5\.x [NC,OR] RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [NC,OR] RewriteCond %{HTTP_USER_AGENT} InternetSeer.com [NC,OR] RewriteCond %{HTTP_USER_AGENT} Irvine [NC,OR] RewriteCond %{HTTP_USER_AGENT} JetCar [NC,OR] RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [NC,OR] RewriteCond %{HTTP_USER_AGENT} KWebGet [NC,OR] RewriteCond %{HTTP_USER_AGENT} larbin [NC,OR] RewriteCond %{HTTP_USER_AGENT} leech[NC,OR] RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [NC,OR] RewriteCond %{HTTP_USER_AGENT} MCspider [NC,OR] RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL [NC,OR] RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [NC,OR] RewriteCond %{HTTP_USER_AGENT} Mirror [NC,OR] RewriteCond %{HTTP_USER_AGENT} Missauga\ Locator[NC,OR] RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator[NC,OR] RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [NC,OR] RewriteCond %{HTTP_USER_AGENT} Monster [NC,OR] RewriteCond %{HTTP_USER_AGENT} Mozilla.*NEWT[NC,OR] RewriteCond %{HTTP_USER_AGENT} Mozilla\/3\.0\.\+Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} Mozilla\/3.Mozilla\/2\.01 [NC,OR] RewriteCond %{HTTP_USER_AGENT} Mozilla\/4\.0$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} Mozzilla [NC,OR] RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} Navroad [NC,OR] RewriteCond %{HTTP_USER_AGENT} NearSite [NC,OR] RewriteCond %{HTTP_USER_AGENT} NetAnts [NC,OR] RewriteCond %{HTTP_USER_AGENT} netattache [NC,OR] RewriteCond %{HTTP_USER_AGENT} NetCarta [NC,OR] RewriteCond %{HTTP_USER_AGENT} NetSpider[NC,OR] RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [NC,OR] RewriteCond %{HTTP_USER_AGENT} NetZIP [NC,OR] RewriteCond %{HTTP_USER_AGENT} NICErsPRO[NC,OR] RewriteCond %{HTTP_USER_AGENT} Octopus [NC,OR] RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer[NC,OR] RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [NC,OR] RewriteCond %{HTTP_USER_AGENT} OpaL [NC,OR] RewriteCond %{HTTP_USER_AGENT} Openfind [NC,OR] RewriteCond %{HTTP_USER_AGENT} OpenTextSiteCrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} PackRat [NC,OR] RewriteCond %{HTTP_USER_AGENT} PageGrabber [NC,OR] RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [NC,OR] RewriteCond %{HTTP_USER_AGENT} pavuk[NC,OR] RewriteCond %{HTTP_USER_AGENT} pcBrowser[NC,OR] RewriteCond %{HTTP_USER_AGENT} Plucker [NC,OR] RewriteCond %{HTTP_USER_AGENT} Production\ Bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} Program\ Shareware [NC,OR] RewriteCond %{HTTP_USER_AGENT} PushSite [NC,OR] RewriteCond %{HTTP_USER_AGENT} RealDownload [NC,OR] RewriteCond %{HTTP_USER_AGENT} ReGet[NC,OR] RewriteCond %{HTTP_USER_AGENT} RepoMonkey [NC,OR] RewriteCond %{HTTP_USER_AGENT} Rover[NC,OR] RewriteCond %{HTTP_USER_AGENT} Rsync[NC,OR] RewriteCond %{HTTP_USER_AGENT} Siphon [NC,OR] RewriteCond %{HTTP_USER_AGENT} ScoutAbout [NC,OR] RewriteCond %{HTTP_USER_AGENT} searchterms\.it [NC,OR] RewriteCond %{HTTP_USER_AGENT} semanticdiscovery[NC,OR] RewriteCond %{HTTP_USER_AGENT} Shai [NC,OR] RewriteCond %{HTTP_USER_AGENT} sitecheck[NC,OR] RewriteCond %{HTTP_USER_AGENT} SiteSnagger [NC,OR] RewriteCond %{HTTP_USER_AGENT} SmartDownload[NC,OR] RewriteCond %{HTTP_USER_AGENT} Spegla [NC,OR] RewriteCond %{HTTP_USER_AGENT} SpiderBot[NC,OR] RewriteCond %{HTTP_USER_AGENT} SuperBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} SuperHTTP[NC,OR] RewriteCond %{HTTP_USER_AGENT} Surfbot [NC,OR] RewriteCond %{HTTP_USER_AGENT} SurfWalker [NC,OR] RewriteCond %{HTTP_USER_AGENT} tAkeOut [NC,OR] RewriteCond %{HTTP_USER_AGENT} tarspider[NC,OR] RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro[NC,OR] RewriteCond %{HTTP_USER_AGENT} Telesoft [NC,OR] RewriteCond %{HTTP_USER_AGENT} Templeton[NC,OR] RewriteCond %{HTTP_USER_AGENT} UtilMind [NC,OR] RewriteCond %{HTTP_USER_AGENT} VoidEYE [NC,OR] RewriteCond %{HTTP_USER_AGENT} w3mir[NC,OR] RewriteCond %{HTTP_USER_AGENT} web.by.mail [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebBandit[NC,OR] RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR] RewriteCond %{HTTP_USER_AGENT} WebCopy [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebEMailExtrac [NC,OR] RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector[NC,OR] RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebAuto [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR] RewriteCond %{HTTP_USER_AGENT} WebFetch [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebMiner [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebReaper[NC,OR] RewriteCond %{HTTP_USER_AGENT} WebSauger[NC,OR] RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [NC,OR] RewriteCond %{HTTP_USER_AGENT} Website\ Quester [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebSnake [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebStripper [NC,OR] RewriteCond %{HTTP_USER_AGENT} webvac [NC,OR] RewriteCond %{HTTP_USER_AGENT} webwalk [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebWhacker [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebZIP [NC,OR] # RewriteCond %{HTTP_USER_AGENT} wget [NC,OR] RewriteCond %{HTTP_USER_AGENT} WhosTalking [NC,OR] RewriteCond %{HTTP_USER_AGENT} Widow[NC,OR] RewriteCond %{HTTP_USER_AGENT} WUMPUS [NC,OR] RewriteCond %{HTTP_USER_AGENT} www\.pl [NC,OR] RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider[NC,OR] RewriteCond %{HTTP_USER_AGENT} XGET [NC,OR] RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR] RewriteCond %{HTTP_USER_AGENT} Zeus.*Webster[NC] #RewriteCond %{HTTP_USER_AGENT} test[NC] RewriteCond %{REQUEST_URI}!^/badUA\.html [NC] RewriteRule .* /badUA.html [L,E=HTTP_USER_AGENT:BAD_USER_AGENT] # this ruleset is to stop blank user agents with blank referrers RewriteCond %{HTTP_REFERER} ^-?$ RewriteCond %{HTTP_USER_AGENT} ^-?$ RewriteRule .* /cgi-bin/noagent.cmd [L,T=application/x-httpd-cgi] |
pandy |
Sep 22 2011, 12:16 PM
Post
#2
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Why do you want to stop UAs with a blank REFERER header and what happens to them? REFERER is an optional header and many browsers let you turn it off. If you don't have a special reason you shouldn't lock them out.
|
Darin McGrew |
Sep 22 2011, 01:05 PM
Post
#3
|
WDG Member Group: Root Admin Posts: 8,365 Joined: 4-August 06 From: Mountain View, CA Member No.: 3 |
QUOTE Why do you want to stop UAs with a blank REFERER header and what happens to them? REFERER is an optional header and many browsers let you turn it of. If you don't have a special reason you shouldn't lock them out. Yep. And some corporate firewalls strip it. (It's an easy way to prevent internal URLs from leaking outside the firewall.) |
pandy |
Sep 22 2011, 01:57 PM
Post
#4
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Also, if you click a link on your own computer or paste the URL in, there won't be any referrer either.
|
Christian J |
Sep 22 2011, 03:33 PM
Post
#5
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
My primary concern is keeping everyone out of the site, preventing site downloads, however i think it might be to tight, will google, bing and yahoo still be able to index me? how can i allow all the big search engines to index me while keeping everyone else away? With "everone else", do you mean bots? QUOTE I would also like to beable to deny certain browsers beyond the index page, or more preferably deny all browsers except the ones i choose, with all others being redirected back to the index page, is this possible? Both bots and browsers can (and do) present themselves as someone else: http://en.wikipedia.org/wiki/User_agent#User_agent_spoofing. Even if it did work, search engines might consider it cloaking: http://www.google.com/support/webmasters/b...py?answer=66355 See also http://htmlhelp.com/feature/art2.htm |
Xabache |
Sep 29 2011, 12:07 AM
Post
#6
|
Group: Members Posts: 5 Joined: 17-September 11 Member No.: 15,419 |
I've since removed this bit for blocking my own images from displaying to me. Options +FollowSymlinks # Protect Hotlinking RewriteEngine On RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www.)?domainname.com/ [nc] RewriteRule .*.(gif|jpg|png)$ http://domainname.com/img/hotlink_f_o.png [nc] So the index ignore * how do i allow known search bots past this? |
Darin McGrew |
Sep 29 2011, 11:52 AM
Post
#7
|
WDG Member Group: Root Admin Posts: 8,365 Joined: 4-August 06 From: Mountain View, CA Member No.: 3 |
Based on the first example here:
http://httpd.apache.org/docs/2.2/rewrite/access.html I'd try simplifying the regular expression on the second RewriteCond statement, like this: RewriteCond %{HTTP_REFERER} !example.com [nc] |
Lo-Fi Version | Time is now: 25th April 2024 - 11:20 PM |