[EP-tech] Seeing unusually high downloads in IRStats
Yuri
yurj at alfa.it
Tue Jul 26 10:49:37 BST 2016
Irstats is just wrong in using the http access instead than a javascript
library (piwik, google analytics). This libraries already has the
knowledge to fight the spammer/bot and rely on a real interaction with a
web browser instead of an http access.
The added value of Irstats is in showing simple stats for every items,
views and downloads, for a period of time. Replicating this simple
statistics in an existing system (like piwik) would be the best solution.
Il 26/07/2016 11:16, Enio Carboni ha scritto:
>
> Hi Betsy,
>
> i write an IP plugin for IRstats2 a few months ago ( to exclude admin
> local IP) where you set IP or range IP or CIDR to a config file.
>
> To use this add the new filter in cfg/cfg.d/z_irstats2.pl like this:
>
> $c->{irstats2}->{datasets} = {access => { filters => [ 'Robots',
> 'Repeat','IP' ] } },
>
> Note the last filter IP
>
> You can download at github and try at
> https://github.com/eniocarboni/irstats2-filter-by-ip
>
> There is also a test script irstats2-filter-by-ip.pl in
> archive/<ID>/bin to test the config file before process all stats.
>
> You could use it this way:
>
> ./irstats2-filter-by-ip.pl <ID> 103.25.156.5
>
> or
>
> ./irstats2-filter-by-ip.pl <ID> 103.25.156.1-103.25.156.19
>
> Of course do not forget to add the IP range to be discarded in cfg /
> cfg.d / z_irstats2_filter_ipcidr_blocks.pl
>
> Let me know if it was useful
>
> Enio Carboni
>
> In data lunedì 25 luglio 2016 23:45:16 CEST, Coles, Elizabeth A.
> (Betsy) ha scritto:
>
> Forwarding from JISC-REPOSITORIES list – we’ve been seeing this in
> California too, and our IRStats2 counts are through the roof for the
> last couple of weeks.
>
> Can anyone tell me how to filter out these robots in IRStats2? And
> how to clean the access file so that our irstats2 reports are not
> distorted by this deluge? I assume I’d want to delete all entries
> with a requester_id in the table below and rerun IRstats2 setup from
> scratch.
>
> Thanks,
>
> Betsy Coles
>
> Caltech – Digital Library Development
>
> bcoles at caltech.edu <mailto:bcoles at caltech.edu>
>
> From: Repositories discussion list
> [mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK] On Behalf Of Hilary Jones
> Sent: Friday, July 15, 2016 3:43 AM To:
> JISC-REPOSITORIES at JISCMAIL.AC.UK
> <mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK> Subject: Seeing unusually
> high downloads in IRStats - IRUS-UK's explanation and why this isn't
> affecting IRUS-UK stats
>
> Hi everyone,
>
> There was a discussion, via UKCORR mailing list, on why there are
> exceptionally high downloads being seen this week in IRStats and what
> might be causing it.
>
> After some investigation we have found that the unusually high
> downloads are down to four IP ranges:
>
> IP range
>
>
>
> Organisation
>
>
>
> Location
>
>
>
> No. IP addresses
>
> 103.25.156.*
>
>
>
> Microsoft Bingbot
>
>
>
> China
>
>
>
> 128
>
> 103.36.96.*
>
>
>
> Microsoft Corporation
>
>
>
> China
>
>
>
> 216
>
> 111.221.28.*
>
>
>
> Microsoft Bingbot
>
>
>
> China
>
>
>
> 256
>
> 202.89.235.*
>
>
>
> Microsoft Bingbot
>
>
>
> China
>
>
>
> 80
>
> These IPs have been systematically trawling and downloading files from
> many UK repositories. Looking at their User Agent strings they do not
> declare themselves as bots but masquerade as normal users.
>
> Happily, the IRUS-UK ingest has been filtering out these robotic
> downloads, so you won’t see a massive spike in your IRUS-UK stats.
>
> We hope this is of help.
>
> Best wishes
>
> Hilary
>
> Hilary JonesServices and Projects Support
>
> 0161 413 7541 Skype hilary.jones at jisc.ac.uk
> <mailto:hilary.jones at jisc.ac.uk>Twitter @JonesHilaryJ 6th Floor
> Churchgate House, 56 Oxford Street, Manchester, M1 6EU
>
> jisc.ac.uk <http://www.jisc.ac.uk/>
>
> Jisc is a registered charity (number 1149740) and a company limited by
> guarantee which is registered in England under Company No. 5747339,
> VAT No. GB 882 5529 90. Jisc’s registered office is: One Castlepark,
> Tower Hill, Bristol, BS2 0JA. T 0203 697 5800. jisc.ac.uk
> <http://www.jisc.ac.uk/>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
More information about the Eprints-tech
mailing list