<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>On Oct 15, 2014, at 8:46 PM, Andrew A. Adams <<a href="mailto:aaa@meiji.ac.jp">aaa@meiji.ac.jp</a>> wrote:</div><div><br class="Apple-interchange-newline"><blockquote type="cite">How many scholarly papers are on the Web? At least 114 million, professor <br>finds<br><br><a href="https://tinyurl.com/kogygol">https://tinyurl.com/kogygol</a><br>The Number of Scholarly Documents on the Public Web<br> Madian Khabsa, C. Lee Giles mail</blockquote><blockquote type="cite"> Published: May 09, 2014<br> DOI: 10.1371/journal.pone.0093949<br>PLOS OnePaper: <a href="https://tinyurl.com/pwefk88">https://tinyurl.com/pwefk88</a></blockquote><div><br></div>THE SOUND OF ONE HAND CLAPPING<br><div><br></div>Extremely interesting finding, but the question it raises can be </div><div>expressed by the old Maine (sexist) joke, which I will here</div><div>present in a gender-neutral way:</div><div><br></div><div>Old-Timer #1: “How’s yir spouse?”</div><div>Old-Timer #2: “Compayured to wot?”</div><div><br></div><div>27M articles are OA out of how many articles <i>published</i>?</div><div><br></div><div>(Not out of how many on the web, but out of how many</div><div>published? And published <i>when</i>?)</div><div><br></div><div>27M is a “dangling numerator.” We need to know the</div><div>denominator. (And also what the ratio was last year,</div><div>and the year before, so we know how fast it’s growing,</div><div>and whether it’s nearer to 10% or 100%.) </div><div><br></div><div>114 articles on the web is not the right denominator.</div><div><br></div><div><div>According to Ulrich’s Global Serials Directory <a href="http://ulrichsweb.com">http://ulrichsweb.com</a></div><div>there are 105,000 peer-reviewed journals. (I don’t know what</div><div>proportion are English-language, nor what proportion are</div><div>uncited, but never mind.)</div><div><br></div><div>Let us (under)estimate extremely conservatively that on average</div><div>they publish at least 15 articles each per year.</div><div><br></div><div>That makes at least 1.5M articles published per year (close to the </div><div>Bjork et al estimate in made in 2009 <a href="http://files.eric.ed.gov/fulltext/EJ837278.pdf">http://files.eric.ed.gov/fulltext/EJ837278.pdf</a> )</div><div><br></div><div>Now we need to know the date of publication of K & G's 27M OA articles.</div><div><br></div><div>And we need to estimate what proportion of the Ulrichs annual 1.5M </div><div>articles is among the total 114M articles found on the web, <i>per year or publication</i>.</div><div><br></div><div>And then we need to calculate what yearly proportion of that yearly subset </div><div>of Ulrichs is among those 27M articles that are OA.</div><div><br></div><div>The K & G ratio of 27M/114M = 24% is unfortunately not the </div><div>ratio we need, neither for the total ratio nor for the yearly ratio.</div><div><br></div><div>The total ratio would be almost meaningless without dates: The total ratio of all </div><div>journal articles ever published?</div><div><br></div><div>So only annual ratios make sense. But if 1.5M were the annual denominator, </div><div>we would then need to know the corresponding annual OA numerator.</div><div><br></div><div>In other words, we need an actual Ulrichs sample of the denominator for, say, </div><div>each of the last 10 years of publication, and then we need to know<i> what proportion </i></div><div><i>of those articles are OA, for each year</i> (the numerator).</div><div><br></div><div>Unfortunately, Ulrichs indexes only journals, not journal articles. For annual</div><div>journal articles one needs to use Thomson-Reuters Web of Science or</div><div>SCOPUS (and they only cover about 12% of Ulrichs -- but never mind, it’s</div><div>certainly a high-priority subset, and perhaps we can estimate the rest</div><div>from further sampling, the way Bjork et al did).</div><div><br></div><div>An <i>extremely</i> crude estimate might be derived from K & G's 27M, using 1.5M</div><div>as the annual denominator, if we had the publication dates for those 27M.</div><div>(Do K & G have those data?) I don’t think 114M is a suitable proxy for that</div><div>denominator.</div><div><br></div><div>I am sure that K & G’s ingenious method can be used to make estimates</div><div>of OA/published ratios by year (and by field). I hope that K & G will</div><div>go on to do so. It will be a great help in tracking the growth of OA.</div><div><br></div><div>Without at least that it still sounds to my ears like just the sound of one </div><div>hand clapping — rather like the download stats that individuals proudly </div><div>post in their CVs these days, without providing any norms, reference </div><div>points or baselines for comparison. Rather like a pharmaceutical company </div><div>that tells you how many patients who took their drug survived (without telling </div><div>you how many didn’t, nor how many patients didn’t take their drug, nor what</div><div>happened to those patients!).</div><div><br></div><div>Stevan Harnad</div><div><br></div><div><br></div></div><div><br></div><div><br><br></div><br></body></html>