[EP-tech] Re: RFC access log table

Alan.Stiles Alan.Stiles at open.ac.uk
Fri Feb 15 10:30:24 GMT 2013


Hi Tim,

Having a quick look through the access table, it might also be nice if there was the option to include / exclude a list of known robots and spiders from the csv dumps, and possibly just to strip them from the access table outside of the dumps, keeping it to a more manageable size without losing 'relevant' information - Bing and Yandex appear to be among our worst offenders.

Alan.

-----Original Message-----
From: Tim Brody [mailto:tdb2 at ecs.soton.ac.uk] 
Sent: 15 February 2013 09:32
To: eprints-tech at ecs.soton.ac.uk
Subject: [EP-tech] Re: RFC access log table

Hi,

Yes, there is nothing in the core that relies on data in access*. The
IRStats 1 & 2 use access to create their summary data.

It looks like the best solution is to provide a tool to periodically dump
historic access data to files, but that it is still useful to keep
"current" (defined by config) data in the database.

All the best,
Tim.

On Fri, 15 Feb 2013 08:13:52 +0100, Yuri <yurj at alfa.it> wrote:
> We've a test server which is a clone of the production server. Can I 
> empty those access tables safely to save space? :) can I do an "delete * 
> from access" without any issue? The same for access__ordervalues_en and 
> all the languages?
> 
> Il 15/02/2013 03:13, Mark Gregson ha scritto:
>> Hi Tim
>>
>> Because of the DB backup issues we invested some time a while ago in
some
>> scripts for archiving the access data off to monthly dumps and for
>> restoring it (if required, say be the need to have IRStats reprocess all
>> data). These scripts are not actually in production use because I
haven't
>> had time to test it to my satisfaction (sorry Nick!).
>>
>> CSV is a more accessible format than a MySQL dump, which may be a
>> benefit.
>>
>> We are using IRStats for statistics which uses the access table but I
>> guess this will be easily updated with a new parser. We also do some
>> custom logging to the access table for reporting on outbound link clicks
>> via IRStats.  This logging is handled via EPrints::Apache::LogHandler.
>>
>> Cheers
>> Mark
>>
>>
>> -----Original Message-----
>> From: eprints-tech-bounces at ecs.soton.ac.uk
>> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Tim Brody
>> Sent: Thursday, 14 February 2013 8:01 PM
>> To: eprints-tech at ecs.soton.ac.uk
>> Subject: [EP-tech] RFC access log table
>>
>> Hi All,
>>
>> I'm thinking about the access log table and how it can be made
>> sustainable.
>>
>> What I'm suggesting is to write accesses to CSV-formatted log files, one
>> file per month. What I don't know is whether anyone is relying on the
>> database table for generating statistics?
>>
>> The problem the access log table creates is in backing-up the EPrints
>> database.
>>
>> I'd appreciate any thoughts/comments.
>>
>> --
>> All the best,
>> Tim
>>
>> *** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: http://www.eprints.org/tech.php/
>> *** EPrints community wiki: http://wiki.eprints.org/
> 
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/

-- 
All the best,
Tim.
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).




More information about the Eprints-tech mailing list