[GOAL] Fwd: Online Academic Abuses and the Power of Openness: Naming & Shaming
Stevan Harnad
amsciforum at gmail.com
Wed Aug 1 02:18:04 BST 2012
---------- Forwarded message ----------
From: Eugene Garfield <eugene.garfield at thomsonreuters.com>
Date: Tue, Jul 31, 2012 at 5:44 PM
Subject: Re: [SIGMETRICS] Online Academic Abuses and the Power of Openness:
Naming & Shaming
To: SIGMETRICS at listserv.utk.edu
*Dear Stevan: Some of your readers may not be able to access my 1997 letter
to BMJ because you have posted a link that requires a subscription to the
BMJ files. The proper link to use is
http://garfield.library.upenn.edu/papers/bmj14june1997.pdf
where I have posted the full text.
Unfortunately my comments have been distorted in some cases to justify
deplorable excesses in the use of references to the same journal when I
emphasized that such references should be relevant and not mere window
dressing-- a blatant attempt to increase the impact factor of the journal
in question. Best wishes. Gene Garfield
*
------------------------------
*From:* ASIS&T Special Interest Group on Metrics [mailto:
SIGMETRICS at LISTSERV.UTK.EDU] *On Behalf Of *Stevan Harnad
*Sent:* Tuesday, July 31, 2012 10:03 AM
*To:* SIGMETRICS at LISTSERV.UTK.EDU
*Subject:* Re: [SIGMETRICS] Online Academic Abuses and the Power of
Openness: Naming & Shaming
Sorry for the long delay in replying to this. I missed it, and it has just
been drawn to my attention:
On 10 April 2012 Gustaf Nelhans wrote:
Dear Professor Harnad,
> I believe that it is not always easy to identify the motives behind
> specific instances of self references (although in the case at hand, the
> number of mutual citations identified seem to speak for themselves…). The
> practice of self citation is (as you acknowledge) not in itself a bad
> thing, but the problem is how to distinguish its legitimate use from its
> abuse.
Agreed. And in fact the outcome of tests comparing rankings and
correlation patterns based on total citation counts, and citation counts
minus self-citations tend to be very similar. Nevertheless, looking at
individuals with or without self-citations and comparing them to the
population norms can raise a red flag which can then be examined manually.
This is equally valid on the individual level as in editor-suggested
> references. I would like to draw into attention an exchange about these
> matters from 1997, where Eugene Garfield stated:
> “Recognising the reality of the Matthew effect, I believe that an editor
> is justified in reminding authors to cite equivalent references from the
> same journal, if only because readers of that journal presumably have ready
> access to it. To call this “manipulation” seems excessive unless the
> references chosen are irrelevant or mere window dressing.” (Garfield, Eugene.
> 1997. Editors are justified in asking authors to cite equivalent references
> from same journal. *BMJ* 314 (7096):1765.
> http://www.bmj.com/content/314/7096/1765.2.short )
Gene Garfield made this suggestion in 1997, before OA became a distinct
possibility. In a world where the only way to access articles is if your
institution can afford a subscription, "preferentially cite this journal"
might have had an ounce of validity -- alongside the obvious pound of
self-interest.
But no longer today.
An editor telling the author of an article to cite more articles in his
journal because readers have "more access" to it is outrageous. Rather, he
should tell authors to self-archive it (Green OA) if they really want to
make their articles more accessible.
My question is if there could exist any method of identifying “bad
> apples” that does not account for the specific context in the article in
> which the reference is placed.
Only in a population statistical sense. Individual anomalies flagged by
the population metrics would still need to be examined manually.
But automated text-analytic tools may eventually also become sensitive
enough to make a contribution, sorting out some of the nature of the
citation from the accompanying text, not just from the
author/article/journal counts.
In my understanding of the problem, the proposed way of using statistical
> methods for identifying baselines for self citations in various fields
> could be one important step, but I wonder if it would suffice to make the
> identification process complete?
It is a necessary but not a sufficient condition for answering all the
kinds of questions one might have about uses and misuses of citations.
In statistics there is always, and necessarily, a difference between
population data and individual cases. Medical conditions are the best
illustration: I have an illness. I want to be treated for my illness, and
not for what, on average, works most often with people that have symptoms
most like mine. (See Kahneman & Tversky on the base rate
fallacy<http://en.wikipedia.org/wiki/Base_rate_fallacy>
.)
For citations, "bad" citations can be identified on a statistical basis,
comparing two populations of citations, and perhaps even one individual's
total citations as compared to the population norms, to see whether there
is something anomalous (such as excess self-citation swelling the citation
count).
But it won't tell you whether an individual citation is good or bad. It is
possible to develop and apply automated text-analytic algorithms to the
text surrounding a citation, to try to predict whether it is positive or
negative, and such algorithms can even be "trained up" with corrective
feedback based on human evaluation of whether each individual citation was
positive or negative.
But it's early days for both of these, and validating statistical
predictors will first take an awful lot of individual hand-validation in
order to test and improve the algorithms.
But for journals or individuals it is definitely possible to check
computationally whether they deviate from population norms/baselines, and
then look at the cases that the population anomaly detectors single out,
and check them manually to see whether they are indeed cases of bad faith,
legitimate practice, or just statistical anomalies.
Citation cartels (and many other systematic abuses) are more detectable if
the entire corpus is accessible precisely because everybody can detect
them: no need to wait to see whether proprietary database owners with other
interests get around to or see fit to provide the data needed to monitor
and detect abuses.
Global OA not only provides the open database, but it provides the
(continuous) open means of flagging anomalies in the population pattern,
checking them, and naming and shaming the cases where there really has been
willful misuse or abuse.
It's yet another potential application for crowd-sourcing.
Stevan Harnad
*Harnad, S. (2008) **Validating Research Performance Metrics Against Peer
Rankings* <http://www.int-res.com/abstracts/esep/v8/n1/p103-107/>*. **Ethics
in Science and Environmental Politics* 8 (11) doi:10.3354/esep00088 The
Use And Misuse Of Bibliometric Indices In Evaluating Scholarly
Performance http://eprints.ecs.soton.ac.uk/15619/
**
*Harnad, S. (2009) **Open Access Scientometrics and the UK Research
Assessment Exercise* <http://eprints.ecs.soton.ac.uk/17142/>*.
Scientometrics 79 (1) *Also in *Proceedings of 11th Annual Meeting of the
International Society for Scientometrics and Informetrics* 11(1),
pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds. (2007)
*Harnad, S. (2009) **Multiple metrics required to measure research
performance*<http://openaccess.eprints.org/index.php?/archives/508-guid.html>
*. *Nature (Correspondence) 457 (785) (12 February
2009)<http://www.nature.com/nature/journal/v457/n7231/full/457785a.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20120731/8e7b404f/attachment.html
More information about the GOAL
mailing list