<div style="font-family:Helvetica;font-size:medium"><div style="margin-top:0cm;margin-right:0cm;margin-left:0cm;margin-bottom:0.0001pt"><font class="Apple-style-span" face="Arial">Sorry for the long delay in replying to this. I missed it, and it has just been drawn to my attention:</font></div>
<div style="margin-top:0cm;margin-right:0cm;margin-left:0cm;margin-bottom:0.0001pt"><font class="Apple-style-span" face="Arial"><br></font></div><div style="margin-top:0cm;margin-right:0cm;margin-left:0cm;margin-bottom:0.0001pt">
<font class="Apple-style-span" face="Arial">On 10 April 2012 Gustaf Nelhans wrote:</font></div></div><div style="margin-top:0cm;margin-right:0cm;margin-left:0cm;margin-bottom:0.0001pt;font-size:medium"><font><div><div><div lang="EN-US" link="blue" vlink="blue" style="word-wrap:break-word">
<div class="Section1"><div style="font-family:Helvetica"><blockquote type="cite"><div lang="EN-US" link="blue" vlink="blue" style="word-wrap:break-word"><div class="Section1"><font class="Apple-style-span" face="Arial"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span lang="EN-GB">Dear Professor Harnad, <br></span><span lang="EN-GB"> I believe that it is not always easy to identify the motives behind specific instances of self references (although in the case at hand, the number of mutual citations identified seem to speak for themselves…). The practice of self citation is (as you acknowledge) not in itself a bad thing, but the problem is how to distinguish its legitimate use from its abuse.</span></blockquote>
<p class="MsoNormal"></p></font></div></div></blockquote><font class="Apple-style-span" face="Arial">Agreed. And in fact the outcome of tests comparing rankings and correlation patterns based on total citation counts, and citation counts minus self-citations tend to be very similar. Nevertheless, looking at individuals with or without self-citations and comparing them to the population norms can raise a red flag which can then be examined manually.<br>
</font><blockquote type="cite"><div lang="EN-US" link="blue" vlink="blue" style="word-wrap:break-word"><div class="Section1"><font class="Apple-style-span" face="Arial"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span lang="EN-GB">This is equally valid on the individual level as in editor-suggested references. I would like to draw into attention an exchange about these matters from 1997, where Eugene Garfield stated:<br></span><span lang="EN-GB">“Recognising the reality of the Matthew effect, I believe that an editor is justified in reminding authors to cite equivalent references from the same journal, if only because readers of that journal presumably have ready access to it. To call this “manipulation” seems excessive unless the references chosen are irrelevant or mere window dressing.” (</span>Garfield, Eugene. 1997. Editors are justified in asking authors to cite equivalent references from same journal. <i><span style="font-style:italic">BMJ</span></i> 314 (7096):1765. <a href="http://www.bmj.com/content/314/7096/1765.2.short">http://www.bmj.com/content/314/7096/1765.2.short</a><span lang="EN-GB"> )</span></blockquote>
<p class="MsoNormal"></p></font></div></div></blockquote></div><div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial">Gene Garfield made this suggestion in 1997, before OA became a distinct possibility. In a world where the only way to access articles is if your institution can afford a subscription, "preferentially cite this journal" might have had an ounce of validity -- alongside the obvious pound of self-interest.</font></div>
<div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial"><br></font></div><div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial">But no longer today.</font></div><div style="font-family:Helvetica">
<font class="Apple-style-span" face="Arial"><br></font></div><div style="font-family:Helvetica"><div><font class="Apple-style-span" face="Arial">An editor telling the author of an article to cite more articles in his journal because readers have "more access" to it is outrageous. Rather, he should tell authors to self-archive it (Green OA) if they really want to make their articles more accessible.</font></div>
</div><div><blockquote type="cite" style="font-family:Helvetica"><div lang="EN-US" link="blue" vlink="blue" style="word-wrap:break-word"><div class="Section1"><font class="Apple-style-span" face="Arial"><p class="MsoNormal">
</p><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span lang="EN-GB"> My question is if there could exist any method of identifying “bad apples” that does not account for the specific context in the article in which the reference is placed.</span></blockquote>
</font></div></div></blockquote><div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial">Only in a population statistical sense. Individual anomalies flagged by the population metrics would still need to be examined manually.</font></div>
<div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial"><br></font></div><div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial">But automated text-analytic tools may eventually also become sensitive enough to make a contribution, sorting out some of the nature of the citation from the accompanying text, not just from the author/article/journal counts.</font></div>
<blockquote type="cite"><div lang="EN-US" link="blue" vlink="blue" style="word-wrap:break-word"><div class="Section1"><blockquote style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex" class="gmail_quote">
<font face="Arial"><span lang="EN-GB">In my understanding of the problem, </span>the proposed way of using statistical methods for identifying baselines for self citations in various fields could be one important step, but I wonder if it would suffice to make the identification process complete?</font></blockquote>
</div></div></blockquote><font class="Apple-style-span" face="Arial" style="font-family:Helvetica">It is a necessary but not a sufficient condition for answering all the kinds of questions one might have about uses and misuses of citations.</font></div>
<div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial"><br></font></div><div style="font-family:Helvetica"><div><font class="Apple-style-span" face="Arial">In statistics there is always, and necessarily, a difference between population data and individual cases. Medical conditions are the best illustration: I have an illness. I want to be treated for my illness, and not for what, on average, works most often with people that have symptoms most like mine. (See Kahneman & Tversky on the <a href="http://en.wikipedia.org/wiki/Base_rate_fallacy">base rate fallacy</a>.)</font></div>
<div><font class="Apple-style-span" face="Arial"><br></font></div><div><font class="Apple-style-span" face="Arial">For citations, "bad" citations can be identified on a statistical basis, comparing two populations of citations, and perhaps even one individual's total citations as compared to the population norms, to see whether there is something anomalous (such as excess self-citation swelling the citation count).</font></div>
<div><font class="Apple-style-span" face="Arial"><br></font></div><div><font class="Apple-style-span" face="Arial">But it won't tell you whether an individual citation is good or bad. It is possible to develop and apply automated text-analytic algorithms to the text surrounding a citation, to try to predict whether it is positive or negative, and such algorithms can even be "trained up" with corrective feedback based on human evaluation of whether each individual citation was positive or negative.</font></div>
<div><font class="Apple-style-span" face="Arial"><br></font></div><div><font class="Apple-style-span" face="Arial">But it's early days for both of these, and validating statistical predictors will first take an awful lot of individual hand-validation in order to test and improve the algorithms.</font></div>
<div><font class="Apple-style-span" face="Arial"><br></font></div><div><font class="Apple-style-span" face="Arial">But for journals or individuals it is definitely possible to check computationally whether they deviate from population norms/baselines, and then look at the cases that the population anomaly detectors single out, and check them manually to see whether they are indeed cases of bad faith, legitimate practice, or just statistical anomalies.</font></div>
</div><div style="font-family:Helvetica"><font class="Apple-style-span" face="Arial"><br></font></div><div style="font-family:Helvetica"><div><font class="Apple-style-span" face="Arial">Citation cartels (and many other systematic abuses) are more detectable if the entire corpus is accessible precisely because everybody can detect them: no need to wait to see whether proprietary database owners with other interests get around to or see fit to provide the data needed to monitor and detect abuses.</font></div>
<div><font class="Apple-style-span" face="Arial"><br></font></div><div><font class="Apple-style-span" face="Arial">Global OA not only provides the open database, but it provides the (continuous) open means of flagging anomalies in the population pattern, checking them, and naming and shaming the cases where there really has been willful misuse or abuse.</font></div>
<div><font class="Apple-style-span" face="Arial"><br></font></div><div><font class="Apple-style-span" face="Arial">It's yet another potential application for crowd-sourcing.</font></div><div><font class="Apple-style-span" face="Arial"><br>
</font></div><div><font class="Apple-style-span" face="Arial">Stevan Harnad</font></div><div><font class="Apple-style-span" face="Arial"><br></font></div></div><div style="font-family:Helvetica"><div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;font:normal normal normal 12px/normal 'Times New Roman'">
<font class="Apple-style-span" face="Arial"><b>Harnad, S. (2008) </b><a href="http://www.int-res.com/abstracts/esep/v8/n1/p103-107/"><span style="color:rgb(24,65,170)"><b>Validating Research Performance Metrics Against Peer Rankings</b></span></a><b>.<i> </i></b><i>Ethics in Science and Environmental Politics</i> 8 (11) doi:10.3354/esep00088 The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance <a href="http://eprints.ecs.soton.ac.uk/15619/"><span style="color:rgb(24,65,170)">http://eprints.ecs.soton.ac.uk/15619/</span></a></font></div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;font:normal normal normal 12px/normal 'Times New Roman';min-height:15px"><font class="Apple-style-span" face="Arial"><b></b><br></font></div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;font:normal normal normal 12px/normal 'Times New Roman'"><font class="Apple-style-span" face="Arial"><b>Harnad, S. (2009) </b><a href="http://eprints.ecs.soton.ac.uk/17142/"><span style="color:rgb(24,65,170)"><b>Open Access Scientometrics and the UK Research Assessment Exercise</b></span></a><b>. <i>Scientometrics</i> 79 (1) </b>Also in <i>Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics</i> 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds. (2007) </font></div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;font:normal normal normal 12px/normal 'Times New Roman';min-height:15px"><font class="Apple-style-span" face="Arial"><br></font></div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;font:normal normal normal 12px/normal 'Times New Roman';color:rgb(22,54,238)"><font class="Apple-style-span" face="Arial"><span style="color:rgb(0,0,0)"><b>Harnad, S. (2009) </b><a href="http://openaccess.eprints.org/index.php?/archives/508-guid.html"><b>Multiple metrics required to measure research performance</b></a><b>. </b><a href="http://www.nature.com/nature/journal/v457/n7231/full/457785a.html">Nature (Correspondence) 457 (785) (12 February 2009<span style="font:normal normal normal 14px/normal 'Times New Roman'">)</span></a></span><span style="font:normal normal normal 14px/normal 'Times New Roman';color:rgb(0,0,0)"> </span></font></div>
<div><span style="font:normal normal normal 14px/normal 'Times New Roman';color:rgb(0,0,0)"><font class="Apple-style-span" face="Arial" size="3"><br></font></span></div></div></div><div style="font-family:Helvetica">
<font class="Apple-style-span" face="Arial"><br></font></div><div style="font-family:Helvetica"><div style="font-family:Helvetica;font-size:medium;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;font:normal normal normal 12px/normal 'Times New Roman';color:rgb(22,54,238)">
<span style="font:normal normal normal 14px/normal 'Times New Roman';color:rgb(0,0,0)"><br></span></div></div></div></div></div></div></font></div>