Deny All

(rev. 1.5 - 2021-10-15)

hunting kitty



Contents:
If you are reading this, I let you in. Don't be evil.

(But, of course you will ...)

Some concepts in the "ban everyone" approach to website(s):

Everyone stay out

top On my site, hacker/bot traffic was outweighing legitimate traffic 5000/1 or so. So, the main part of my site is "Deny All" by default.

.(But, I decided to put this info in the "public" part of my site because I went to a lot of work to collect this information & the point is to share it. Whether it is useful and how you use it is up to you.)

The concept: Blocking all traffic except for "permitted" IPs makes for a much simpler .htaccess than trying to block all the evil-doers. Here are some IPs you might want to grant access:

#bingbot
 Require ip 207.46.13
#googlebot
 Require ip 64.21.98.41 66.249.65.1 66.249.65.5 66.249.66 
 Require ip 66.249.73.132 66.249.73.149 66.249.73.151
#mojeek
 Require ip 5.102.173.71
#qwant
 Require ip 194.187.168 91.242.162.18
#yandex
 Require ip 77.88.5
                
The search engine "giants" have many more IPs. I was having trouble with Bing and Yandex seeking non-existent gibberish files, so I haven't permitted many of their IPs. You can tell from your logs whether there are additional IPs you would like to add to your "permit" list. (I do want *some* traffic, so blocking search engine IPs is a bit counter-productive!) Huawei's "PetalBot" is just ridiculously misbehaved, so not listed at all. Nor are the ones that haven't bothered to show up at my site yet.

We do not need any SEO or "security" goombas, of which there are now many. They seem to have massive crawling power and show up constantly.

No site is "unhackable." There are sites that have been hacked and sites that will be hacked. But, there is no reason to provide an open invitation via bulky CMS code. There is no reason to allows known pests (TOR, VPNs, Cloud servers) access. If we limit bot and hacker access, it will take longer for the persistent hacker to find a doorway in.

You will note that my "semi-anonymous" approach would grant a hacker access who uses a VPN and a throw-away email address. This would also mean other VPN users on the same system would find a doorway via the now "open" IP. Hopefully by monitoring the logs I will catch the perps and block them before they do any damage.

Probably I need to automate "unpermitting" an IP as well as permitting it.

Why hostname blocks often don't work.

My understanding is that the apache server will run a reverse DNS lookup and if the IP and hostname do not match up, apache will not block the hostname. Hostnames are consistently spoofed.

Another "trick" can be to make the "hostname" look like it is the IP number. So, if you are not logging both the IP number and the hostname, it will not be apparent this has been done. For instance, my provider logs either the IP address or the hostname in the logs, but not both at the same time. (This is determined by the contents of the .htaccess file.) While it's possible to create one's own log via php files, that is an additional project and additional load on the server.

Also, many VPNs will incorporate the IP address in the hostname in some manner. Google content will reverse the IP address, as do some other serves. Some will separate the last three digits from the other three sets of numbers and place them with some letters, such as "sub.267". Apparently, these hosts want to be able to identify which of their customers are causing problems (based on logs) but don't want to make it to easy on webmasters to simply filter them out.

No More Favicon

top Who needs it, right? Waste of bandwidth -- and also, hackers will check for the favicon to see if they are banned.
Add this to your <head> section: <link rel="icon" href="data:,">
more here

Send 'em Back Where They Came From

top I don't know if this is a good idea or not. Code added to top of 404.php and/or 403.php will redirect to their own ip. A bot isn't a browser, will probably just add the redirect to the links it is crawling. (But, a lot of bots went away.)
<?php
    echo '<meta http-equiv="refresh" content="0; URL=https://'.$_SERVER['REMOTE_ADDR'].'" />';
?>
                
(Of course, you will have to add:

ErrorDocument 403 /403.php
ErrorDocument 404 /404.php
                
to your .htaccess file.)

Get rid of robots.txt

top For the most part, bots do not honor robots.txt directives. It's a pointless file, so get rid of it. Hacker bots check to see if it exists to verify site's existence.

The bots will then get a 404 and if you have 404 kicking them back to their own IP. Well, just another effort that failed.

Even good bots are bad ...

top

VPNs and/or Cloud Services

top I haven't figured out whether "my potential audience" will be visiting from virtual machines in the cloud. I don't really care if visitors wish to be anonymous, I just don't want them messing with (or attempting to mess with) my website. If I authorize "your" VPN IP, it's not just you, though, is it ... ?

The point is, most of the hacking traffic could be blocked at the source if these companies wanted to do so. It's not all that difficult to determine if your user is sending out "GET wp-login" requests or random "POST" requests via bot.

So, let's list some services that we probably don't want to ever hear from because their customers are constantly hacking away. As far as I can tell, VPN companies frequently set up their nodes on major cloud services, so the two problem sources are sort of inseparable. If you are running a legit website, you probably are not sending bots out to hack other people's websites, right? (At least I'm not ...) So, it won't matter if your website IP address is blocked from accessing my website.

Many bots will change IP addresses each request, running through a series of colocrossing, ovh, datacamp or other IPs from across the globe to test whether the site is banning by location. They will also change their user-agent string, just in case the webmaster is blocking some element of the bot's user string.

Multi-VPN attacks

topA common attack technique now makes requests from multiple VPNs simultaneously (or in rapid succession). Example:

Here we have someone who attempted to access my site at the exact same time from "Johannesburg S.A./Canada/Helsinki, Finland (Fibregrid x2), Amsterdam Netherlands/Israel (SC-RAPIDSEEDBOX), Ontario CA / Los Angeles CA (B2Net Solutions/Servermania) and Buffalo NY/Estonia (Colocrossing)".

107.172.170.169 / 10/13/21, 5:34 AM 1426 error 403 GET HTTP/1.1 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36. Whois: 107.172.170.169

144.168.225.96 / 10/13/21, 5:34 AM 1401 error 403 GET HTTP/1.1 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36. Whois: 144.168.225.96

185.122.170.19 / 10/13/21, 5:34 AM 1401 error 403 GET HTTP/1.1 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36. Whois: 185.122.170.19

196.242.47.79 / 10/13/21, 5:34 AM 1398 error 403 GET HTTP/1.1 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36. Whois: 196.242.47.79

196.244.200.193 /403_2.php 10/13/21, 5:34 AM 1060 success 200 GET HTTP/1.1 http://daltrey.org Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36. Whois: 196.244.200.193

No Hits from Hackers!!