<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 10.00.9200.17116"></HEAD>
<BODY
style="WORD-WRAP: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space">
<DIV dir=ltr align=left><SPAN class=942071113-16102014><FONT color=#0000ff><FONT
size=2 face=Arial>See article by Arif Jinha in <EM>Learned Publishing</EM>
from 2010 - </FONT>
<DIV class=page-heading>
<DIV class=heading-text>
<H1 class=abstract-heading><FONT size=2 face=Arial>50 million: an estimate of
the number of scholarly articles in existence </FONT></H1></DIV></DIV>
<DIV id=infoArticle>
<DIV class=supMetaData><FONT size=2 face=Arial></FONT></DIV>
<DIV class=supMetaData>
<P><FONT size=2 face=Arial>Jinha, Arif E</FONT></P>
<P><FONT size=2 face=Arial><A
href="http://dx.doi.org/10.1087/20100308">http://dx.doi.org/10.1087/20100308</A></FONT></P>
<P><SPAN class=942071113-16102014><FONT size=2
face=Arial>Sally</FONT></SPAN></P></DIV></DIV></FONT></SPAN></DIV>
<DIV align=left><FONT size=2 face=Arial>Sally Morris</FONT></DIV>
<DIV align=left><FONT size=2 face=Arial>South House, The Street, Clapham,
Worthing, West Sussex, UK BN13 3UU</FONT></DIV>
<DIV align=left><FONT size=2 face=Arial>Tel: +44 (0)1903
871286</FONT></DIV>
<DIV align=left><FONT size=2 face=Arial>Email:
sally@morris-assocs.demon.co.uk</FONT></DIV>
<DIV> </DIV><BR>
<DIV lang=en-us class=OutlookMessageHeader dir=ltr align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> goal-bounces@eprints.org
[mailto:goal-bounces@eprints.org] <B>On Behalf Of </B>Stevan
Harnad<BR><B>Sent:</B> 16 October 2014 13:10<BR><B>To:</B> Global Open Access
List (Successor of AmSci)<BR><B>Subject:</B> [GOAL] Re: 114 million scholarly
documents on the web;27 million toll-free<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV>On Oct 15, 2014, at 8:46 PM, Andrew A. Adams <<A
href="mailto:aaa@meiji.ac.jp">aaa@meiji.ac.jp</A>> wrote:</DIV>
<DIV><BR class=Apple-interchange-newline>
<BLOCKQUOTE type="cite">How many scholarly papers are on the Web? At least 114
million, professor <BR>finds<BR><BR><A
href="https://tinyurl.com/kogygol">https://tinyurl.com/kogygol</A><BR>The
Number of Scholarly Documents on the Public Web<BR> Madian Khabsa, C.
Lee Giles mail</BLOCKQUOTE>
<BLOCKQUOTE type="cite"> Published: May 09,
2014<BR> DOI: 10.1371/journal.pone.0093949<BR>PLOS
OnePaper: <A
href="https://tinyurl.com/pwefk88">https://tinyurl.com/pwefk88</A></BLOCKQUOTE>
<DIV><BR></DIV>THE SOUND OF ONE HAND CLAPPING<BR>
<DIV><BR></DIV>Extremely interesting finding, but the question it raises can
be </DIV>
<DIV>expressed by the old Maine (sexist) joke, which I will here</DIV>
<DIV>present in a gender-neutral way:</DIV>
<DIV><BR></DIV>
<DIV>Old-Timer #1: “How’s yir spouse?”</DIV>
<DIV>Old-Timer #2: “Compayured to wot?”</DIV>
<DIV><BR></DIV>
<DIV>27M articles are OA out of how many articles <I>published</I>?</DIV>
<DIV><BR></DIV>
<DIV>(Not out of how many on the web, but out of how many</DIV>
<DIV>published? And published <I>when</I>?)</DIV>
<DIV><BR></DIV>
<DIV>27M is a “dangling numerator.” We need to know the</DIV>
<DIV>denominator. (And also what the ratio was last year,</DIV>
<DIV>and the year before, so we know how fast it’s growing,</DIV>
<DIV>and whether it’s nearer to 10% or 100%.) </DIV>
<DIV><BR></DIV>
<DIV>114 articles on the web is not the right denominator.</DIV>
<DIV><BR></DIV>
<DIV>
<DIV>According to Ulrich’s Global Serials Directory <A
href="http://ulrichsweb.com">http://ulrichsweb.com</A></DIV>
<DIV>there are 105,000 peer-reviewed journals. (I don’t know what</DIV>
<DIV>proportion are English-language, nor what proportion are</DIV>
<DIV>uncited, but never mind.)</DIV>
<DIV><BR></DIV>
<DIV>Let us (under)estimate extremely conservatively that on average</DIV>
<DIV>they publish at least 15 articles each per year.</DIV>
<DIV><BR></DIV>
<DIV>That makes at least 1.5M articles published per year (close to
the </DIV>
<DIV>Bjork et al estimate in made in 2009 <A
href="http://files.eric.ed.gov/fulltext/EJ837278.pdf">http://files.eric.ed.gov/fulltext/EJ837278.pdf</A>
)</DIV>
<DIV><BR></DIV>
<DIV>Now we need to know the date of publication of K & G's 27M OA
articles.</DIV>
<DIV><BR></DIV>
<DIV>And we need to estimate what proportion of the Ulrichs annual
1.5M </DIV>
<DIV>articles is among the total 114M articles found on the web, <I>per
year or publication</I>.</DIV>
<DIV><BR></DIV>
<DIV>And then we need to calculate what yearly proportion of that yearly
subset </DIV>
<DIV>of Ulrichs is among those 27M articles that are OA.</DIV>
<DIV><BR></DIV>
<DIV>The K & G ratio of 27M/114M = 24% is unfortunately not the </DIV>
<DIV>ratio we need, neither for the total ratio nor for the yearly ratio.</DIV>
<DIV><BR></DIV>
<DIV>The total ratio would be almost meaningless without dates: The total ratio
of all </DIV>
<DIV>journal articles ever published?</DIV>
<DIV><BR></DIV>
<DIV>So only annual ratios make sense. But if 1.5M were the annual
denominator, </DIV>
<DIV>we would then need to know the corresponding annual OA numerator.</DIV>
<DIV><BR></DIV>
<DIV>In other words, we need an actual Ulrichs sample of the denominator for,
say, </DIV>
<DIV>each of the last 10 years of publication, and then we need to know<I> what
proportion </I></DIV>
<DIV><I>of those articles are OA, for each year</I> (the numerator).</DIV>
<DIV><BR></DIV>
<DIV>Unfortunately, Ulrichs indexes only journals, not journal articles. For
annual</DIV>
<DIV>journal articles one needs to use Thomson-Reuters Web of Science or</DIV>
<DIV>SCOPUS (and they only cover about 12% of Ulrichs -- but never mind,
it’s</DIV>
<DIV>certainly a high-priority subset, and perhaps we can estimate the
rest</DIV>
<DIV>from further sampling, the way Bjork et al did).</DIV>
<DIV><BR></DIV>
<DIV>An <I>extremely</I> crude estimate might be derived from K & G's 27M,
using 1.5M</DIV>
<DIV>as the annual denominator, if we had the publication dates for those
27M.</DIV>
<DIV>(Do K & G have those data?) I don’t think 114M is a suitable proxy for
that</DIV>
<DIV>denominator.</DIV>
<DIV><BR></DIV>
<DIV>I am sure that K & G’s ingenious method can be used to make
estimates</DIV>
<DIV>of OA/published ratios by year (and by field). I hope that K & G
will</DIV>
<DIV>go on to do so. It will be a great help in tracking the growth of OA.</DIV>
<DIV><BR></DIV>
<DIV>Without at least that it still sounds to my ears like just the sound of
one </DIV>
<DIV>hand clapping — rather like the download stats that individuals
proudly </DIV>
<DIV>post in their CVs these days, without providing any norms,
reference </DIV>
<DIV>points or baselines for comparison. Rather like a pharmaceutical
company </DIV>
<DIV>that tells you how many patients who took their drug survived (without
telling </DIV>
<DIV>you how many didn’t, nor how many patients didn’t take their drug, nor
what</DIV>
<DIV>happened to those patients!).</DIV>
<DIV><BR></DIV>
<DIV>Stevan Harnad</DIV>
<DIV><BR></DIV>
<DIV><BR></DIV></DIV>
<DIV><BR></DIV>
<DIV><BR><BR></DIV><BR></BODY></HTML>