[EP-tech] Re: Exposing metadata -EPrints

madhan muthu mu.madhan at gmail.com
Tue Jul 2 15:33:09 BST 2013


Dear All:
>
> When AGRIS team tried to harvest our repository records, it came up with
a request to expose keywords and journal separately through OAI request.
 Please see here the way our repository  exposes metadata elements
http://oar.icrisat.org/cgi/oai2 .  How to expose metadata elements the way
we want in EPrints?
>
>
>
> Thanks
> Madhan
>


-----Original Message-----
From: franc at library.iisc.ernet.in [mailto:franc at library.iisc.ernet.in]
Sent: Tuesday, July 02, 2013 5:29 PM
To: Madhan, M (ICRISAT-IN)
Subject: Re: Madhan requests help

Hi Madan, I had a look at both our repositories. The OAI record doesn't
explicitly expose the source and the keyword metadata elements. If this
needs to be done, then one has to tweak the eprins code (oai2.pl). This is
my understanding. you may also check with the eprints tech list.

Best, Francis

On Tue, 2 Jul 2013, Madhan, M (ICRISAT-IN) wrote:

> Dear Francis:
>
> When AGRIS team tried to harvest our repository records, it came up with
a request to expose keywords and journal separately through OAI request.
 Please see here the way our repository  exposes metadata elements
http://oar.icrisat.org/cgi/oai2 and even IISc repository exposes the same
way.  How to expose metadata elements the way we want in EPrints?
>
>
>
> Thanks
> Madhan
>
> ________________________________
> From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi at fao.org]
> Sent: Tuesday, July 02, 2013 3:15 PM
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes; Celli, Fabrizio (OEKC)
> Subject: RE: ICRISAT REPO
>
> Dear Madhan
>
> We discussed internally and decided to hold until the harvested metadata
from ICRISAT will be able to output keywords, either uncontrolled or
agrotags.
> You will also have the time to see if it is possible to isolate at least
the journal title and ISSN from the merged citation information which
ICRISAT is dumping to dc:identifier.
>
> Thank you and regards,
>
> Stefano
>
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 01 July 2013 14:59
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT REPO
>
> Hi Madhan
>
> I will discuss with my colleagues the feasibility to index data with all
the issues that are listed in an email, reported below, which include the
problem of lack of keywords, and the "merged "citation" information in
dc:identifier".
> I will get back to you as soon as possible.
>
> Cheers
>
> Stefano
>
>
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan at cgiar.org]
> Sent: 01 July 2013 06:54
> To: Anibaldi, Stefano (OEKC); Johannes Keizer
> Subject: ICRISAT REPO
>
> Dear Stefano:
>
> I am trying to tweak the codes to expose keywords separately to the
harvesters.  I am in discussion with forum members.  I would request you
please harvest the records without keywords.  We may have to re-run the
harvest once our repository can expose keywords as well.
>
> Many thanks
>
> M Madhan
> Manager, Library and Information Services
> Knowledge Sharing and Innovation
> International Crops Research Institute for the Semi-Arid Tropics
> Patancheru, Hyderabad 502 324
> M.Madhan at cgiar.org<mailto:M.Madhan at cgiar.org>
> mu.madhan at gmail.com<mailto:mu.madhan at gmail.com>
>
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 01 July 2013 10:10
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Dear Madhan,
>
> Thanks a lot and no problems :)
>
> Yes, as I was writing below, "The main "subjects" are present, but not
the keywords, either uncontrolled or Agrotags (taken from AgroPedia). This
is occurring when with all the several metadata formats offered.
> I also noticed that one of the search engines (BASE) harvested and
indexed your metadata and is completely missing with this essential
information (especially for AGRIS and its RDF store)."
> No problems for the publication of the data, actually I also come back
from holidays.. :)
>
> Thanks again,
> Stefano
>
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan at cgiar.org]
> Sent: 23 June 2013 12:06
> To: Anibaldi, Stefano (OEKC)
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Stefano:
>
> Sorry again.  There was an emergency in family, hence, I happened to rush
on leave.
>
> Bye the way, I just noticed that the "uncontrolled keywords" are not
exposed.  I tried to use "Agrotagger" but, I gave up as it was not able to
assign proper keywords for a document.  Let me find the way to expose the
keywords and get back to you.  Shall we delay indexing for a couple of days
so that I can give a try?
>
>
> Madhan
> ________________________________
> From: Anibaldi, Stefano (OEKC) [Stefano.Anibaldi at fao.org]
> Sent: 21 June 2013 14:53:57
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes
> Subject: RE: ICRISAT open access data
> Dear Madhan,
>
> Could you please advice if the OAI data can include the keywords and
eventually also part of the merged "citation" information in dc:identifier?
>
> Thank you and regards
>
> Stefano
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 18 June 2013 16:11
> To: 'Madhan, M (ICRISAT-IN)'
> Subject: RE: ICRISAT open access data
>
> No problems Madhan, take your time.
> Please include also Johannes in your email since this morning we had a
joint discussion on this specific issue.
> Cheers
> Stefano
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan at cgiar.org]
> Sent: 18 June 2013 15:01
> To: Anibaldi, Stefano (OEKC)
> Subject: RE: ICRISAT open access data
>
> Stefano:
>
> Sorry for the belated reply. I was a bit held up.
>
> Give me a day.  I will give a detailed note about all the queries.  Thanks
>
> Madhan
>
> ________________________________
> From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi at fao.org]
> Sent: Tuesday, June 18, 2013 5:34 PM
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes
> Subject: RE: ICRISAT open access data
>
> Dear Madhan,
>
> This morning I had a brief discussion with Johannes (in copy) and we
agreed to accept the metadata with the full citation information merged as
is.
> We would anyway recommend to have journal titles, ISSN, ISBN, pagination,
vol/no information and more, indexed in separate fields.
>
> On another front, please let us know the feasability of exposing the
keywords in the OAI-PMH repository, in a way that we can index them in
AGRIS, too.
>
> Thank you and regards,
>
> Stefano
>
>
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 14 June 2013 14:23
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Hi Madhan,
>
> There are yet two issues separate from the problems listed below on URL
links, for the finalization of the harvesting and indexing of the ICRISAT
data.
> For the subjects, and more generally the indexing part, I had the chance
to look a bit more in detail the data that you send via ftp and it appears
that ICRISAT OAI-PMH data does not expose the keywords, when they are
actually present in the Open Access Repository of ICRISAT.
> The main "subjects" are present, but not the keywords, either
uncontrolled or Agrotags (taken from AgroPedia). This is occurring when
with all the several metadata formats offered.
> I also noticed that one of the search engines (BASE) harvested and
indexed your metadata and is completely missing with this essential
information (especially for AGRIS and its RDF store).
>
> Then, I found out that a complete set of information like journal title,
date of pub, collation, publisher, vol/no, authors, and so on.. is included
all together in dc:identifier (as well as the URLs..). It would be
essential if this information is separated in its proper metatags.
> AGRIS has proper indexes for dates, names, journals, issns and other
information and if this is all merged into one tag, it becomes impossible,
if, as is this case, there is no fixed pattern that would allow us to
normalize the text internally.
>
> Kindly let me know.
>
> Cheers
> Stefano
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 11 June 2013 16:00
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: 'Giannis Stoitsis'; 'nikosm at agroknow.gr'; Keizer, Johannes (OEKC)
> Subject: RE: ICRISAT open access data
>
> Hello Madhan,
>
> Thanks for your response.
> I am sure we'll find a solution to this issue, since, as is, we would
have problems publishing the metadata that was harvested from your OAI
server.
> I accessed some data from the five thousand and more harvested by our
Agro-Know colleagues, but I only take one record for as an example, with
three "links to the full text".
> The main problem is that there are multiple dc:identifier elements, and
we are not sure which one is the right one for AGRIS, which needs one URL
that leads a user to access the full text,  which information, meaningfully
enough, is described in the AGRIS Search with the label "Full-Text". For
the three (I noticed that most of the records offer four URLs) URLs offered
in the attached XML record, this seems difficult to achieve. In fact for
the following three URLs:
>
> 1.       http://oar.icrisat.org/5/1/cs51_5pp-2011_%282%29.pdf
>
> 2.       http://dx.doi.org/10.2135/cropsci2010.07.0440
>
> 3.       http://oar.icrisat.org/5/
> No. 1. URL is leading the user to the following screen, showing that the
access to the PDF is restricted. Result: the user leaves this page and
maybe goes back to the reference itself and access No. 2 link
>
> [cid:image001.jpg at 01CE7643.15BCBC60]
> No.2 As is called, "The Official URL", is the DOI link to the Springer
metadata and the possibility to purchase the publication upon subscription
(!)
> No.3 is the metadata reference as is exposed and published in the ICRISAT
repository and contains the widget that you are mentioning below and that
should provide the user with the resource itself.
>
> A quick temporary solution would be to index the ICRISAT data, including
only the URL that are effectively landing on the full text and excluding
all the other URLs.
>
> Please let me know what you think and how we can do to index only the
URLs linking directly to the full text.
>
> Best regards,
>
> Stefano
>
>
> From: Madhan, M (ICRISAT-IN) [mailto:M.Madhan at cgiar.org]
> Sent: 07 June 2013 05:04
> To: Anibaldi, Stefano (OEKC)
> Cc: 'Giannis Stoitsis'; 'Nikos Manolis'; 'nikosm at agroknow.gr'; Johannes
Keizer
> Subject: RE: ICRISAT open access data
>
> Stefano:
>
> Greetings!
>
> In our repository, for a few documents direct download is restricted to
our local users.  However, for others, we have given request copy button
for each document that is restricted.  Harvesters, normally redirect users
to the repositories to download full-text.  Hence, you may consider linking
the persistent URL of the document metadata rather linking the full-text
PDF.  I don't have indicators (OA or restricted) readily built-in with the
repository.  I would request you to harvest all the metadata of our
repository and each record has the provision to reach the full-text.
>
> Many thanks.
>
> Madhan
>
> See: http://oar.icrisat.org/6842/
>
>
> [cid:image004.jpg at 01CE66AC.C4EDF420]
>
> ________________________________
> From: Anibaldi, Stefano (OEKC) [mailto:Stefano.Anibaldi at fao.org]
> Sent: Thursday, June 06, 2013 8:00 PM
> To: Madhan, M (ICRISAT-IN)
> Cc: Keizer, Johannes; Subirats, Imma (OEKC)
> Subject: RE: ICRISAT open access data
>
> Dear Madhan,
>
> Could you please tell me if there is a way for us to identify the links
to the full text of the documents that are open to the entire community and
those that are not?
> We are close to the publication of the ICRISAT metadata in AGRIS, but we
do not want to publish the publications that are restricted.
>
> Thanks and regards
>
> Stefano A.
> AGRIS Team
>
>
> From: Anibaldi, Stefano (OEKC)
> Sent: 04 June 2013 17:19
> To: 'Madhan, M (ICRISAT-IN)'
> Cc: Giannis Stoitsis (stoitsis at ieee.org<mailto:stoitsis at ieee.org>); Nikos
Manolis (manolisn at agroknow.gr<mailto:manolisn at agroknow.gr>);
nikosm at agroknow.gr<mailto:nikosm at agroknow.gr>
> Subject: ICRISAT open access data
>
> Dear Madhan
>
> Hope things are fine with you. Long time, no see.. :)
>
> We are now working in collaboration with the Greek colleagues of
Agro-Knows (cced) who are trying to harvest and process the metadata from
your repository.
> Now, I have a question regarding the information on the full text of the
OAI-PMH ICRISAT data.
>> From the data harvested, both in didl and mets, I noticed a few (I did
not check them all)  URLs that are pointing to a pdf that cannot be
accessed freely.
> In the first three records I accessed, for example, from one XML doc that
I just now harvested, all of them have restricted access and the user will
end up to the ICRISAT Login page:
> http://oar.icrisat.org/15/1/1606_ftp.pdf
> http://oar.icrisat.org/86/1/AsianBiotechDevRev_12_3_17-34_2010.pdf
> http://oar.icrisat.org/87/1/BiosystemsEng105_2_198-204_2010.pdf
>
> Now, I am not sure in percentage how much of the ICRISAT collection is
real open access, but, I magine that if an open repository expose metadata
to the OAI-PMH, it would be really great to hide those URLs that are not
linking to the full text of the document, since the information in itself
is not useful to the aggregators, search engines and the students and
researchers who aim to deepen their knowledge directly from the internet,
without the necessity to send requests to the data owners.
>
> What do you think?
>
> Thanks and regards
> Stefano A.
>
>
>
>
>
>
>
>

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




-- 
Madhan, M
Manager, Library and Information Services
International Crops Research Institute for Semi-Arid Tropics (ICRISAT)
Patancheru, Hyderabad
India
www.icrisat.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20130702/1cd6b69b/attachment-0001.html 


More information about the Eprints-tech mailing list