[EP-tech] Seeing unusually high downloads in IRStats
Yuri
yurj at alfa.it
Tue Jul 26 14:21:09 BST 2016
With Apache:
RewriteEngine On
RewriteCond %{HTTP:User-Agent}
(?:Yandex|msnbot|Owlinbo|sistrix|genieo|proximic|MJ12bot|AhrefsBot|searchmetrics|SearchmetricsBot|Baidu)
[NC]
RewriteRule .? - [F]
just add the guilty.
Problem solved :-D
Il 26/07/2016 14:13, Graham, Clinton T ha scritto:
>
> The University of Pittsburgh opened ticket UCM000000270852 with Bing
> Webmaster Support last week regarding this and received the following
> response:
>
> Thank you for contacting Bing Webmaster Support. The activity you are
> seeing is most likely caused by one of our bots used for verifying
> your site rather than indexing your site as Bingbot does. These
> crawlers do not have the same UA, and are in place to make sure the
> verification aspects of your site are in place.
>
> Yesterday, we requested additional information on what “verification”
> really means, and describe the problem of conflating user-generated
> activity with bot-generated activity, especially for the scholarly
> publication process.
>
> I’ll reply again here if this support request goes anywhere, but
> perhaps others might be interested in similarly engaging Bing
> Webmaster Support?
>
> Enjoy,
>
> - Clinton Graham
>
> Systems Developer
>
> University of Pittsburgh | University Library System
>
> 412-383-1057
>
> *From:*eprints-tech-bounces at ecs.soton.ac.uk
> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] *On Behalf Of *Coles,
> Elizabeth A. (Betsy)
> *Sent:* Monday, July 25, 2016 7:45 PM
> *To:* eprints-tech at ecs.soton.ac.uk
> *Subject:* [EP-tech] Seeing unusually high downloads in IRStats
>
> Forwarding from JISC-REPOSITORIES list – we’ve been seeing this in
> California too, and our IRStats2 counts are through the roof for the
> last couple of weeks.
>
> Can anyone tell me how to filter out these robots in IRStats2? And
> how to clean the access file so that our irstats2 reports are not
> distorted by this deluge? I assume I’d want to delete all entries
> with a requester_id in the table below and rerun IRstats2 setup from
> scratch.
>
> Thanks,
>
> Betsy Coles
>
> Caltech – Digital Library Development
>
> bcoles at caltech.edu <mailto:bcoles at caltech.edu>
>
> *From:* Repositories discussion list
> [mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK] *On Behalf Of *Hilary Jones
> *Sent:* Friday, July 15, 2016 3:43 AM
> *To:* JISC-REPOSITORIES at JISCMAIL.AC.UK
> <mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK>
> *Subject:* Seeing unusually high downloads in IRStats - IRUS-UK's
> explanation and why this isn't affecting IRUS-UK stats
>
> Hi everyone,
>
> There was a discussion, via UKCORR mailing list, on why there are
> exceptionally high downloads being seen this week in IRStats and what
> might be causing it.
>
> After some investigation we have found that the unusually high
> downloads are down to four IP ranges:
>
> IP range
>
>
>
> Organisation
>
>
>
> Location
>
>
>
> No. IP addresses
>
> 103.25.156.*
>
>
>
> Microsoft Bingbot
>
>
>
> China
>
>
>
> 128
>
> 103.36.96.*
>
>
>
> Microsoft Corporation
>
>
>
> China
>
>
>
> 216
>
> 111.221.28.*
>
>
>
> Microsoft Bingbot
>
>
>
> China
>
>
>
> 256
>
> 202.89.235.*
>
>
>
> Microsoft Bingbot
>
>
>
> China
>
>
>
> 80
>
> These IPs have been systematically trawling and downloading files from
> many UK repositories. Looking at their User Agent strings they do not
> declare themselves as bots but masquerade as normal users.
>
> Happily, the IRUS-UK ingest has been filtering out these robotic
> downloads, so you won’t see a massive spike in your IRUS-UK stats.
>
> We hope this is of help.
>
> Best wishes
>
> Hilary
>
> Jisc
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d>
>
> *Hilary Jones*
> Services and Projects Support
>
> 0161 413 7541
> Skype hilary.jones at jisc.ac.uk <mailto:hilary.jones at jisc.ac.uk>
> Twitter @JonesHilaryJ
> 6th Floor Churchgate House, 56 Oxford Street, Manchester, M1 6EU
>
> *jisc.ac.uk
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d>
> *
>
> Jisc is a registered charity (number 1149740) and a company limited by
> guarantee which is registered in England under Company No. 5747339,
> VAT No. GB 882 5529 90. Jisc’s registered office is: One Castlepark,
> Tower Hill, Bristol, BS2 0JA. T 0203 697 5800. jisc.ac.uk
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d>
>
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
More information about the Eprints-tech
mailing list