I checked the list again (well, a quarter of it). Most requests are obviously single IP hits - thus, probably following search results.
Most bogus or disruptive requests are either from large companies (I've seen more Facebook-related access this morning than yesterday), including Amazon itself. There are also a couple of apparently VPN-related data centers in Europe (Germany, Finland, others), Singapore (one of the main "culprits", a Huawei-owned data center), but mostly the U.S. and Canada we see a lot of seemingly untargeted hits from - I suspect those to be the origins of possible scraping operations (they seem to use random URLs as part of their requests, most of which will go nowhere but will still cause the server to throw errors).
So, while high "guest" numbers may be indicative of high traffic, they're only partially related to site performance. What angers me is that there are obviously some points of origin (i.e. IPs or IP blocks) that are most probably disruptive, but we can't do anything about them, at least nothing that's not complicated and time-consuming. Blocking VPN-related IP ranges may result in members losing access, for instance. It's definitely kind of frustrating.
On the other hand, obsessing about it won't change anything. So, if you feel the need to document things, please continue to do so. But I suspect that there's no easy solution out there, at least not an accessible one.
Here's another observation for you, though. Interestingly, whenever I start digging into things regarding the list, numbers quickly go down (after half an hour or so) ... We're already at about half the numbers that appeared when I started this morning. That opens up an interesting possibility: Tracerouting, IP pinging and WHOIS requests may drive some of those operations away, at least temporarily.
EDIT: In the past ten minutes, numbers have been surging again ...
M.