[GOAL] Re: 114 million scholarly documents on the web; 27 million toll-free
Stevan Harnad
harnad at ecs.soton.ac.uk
Thu Oct 16 13:09:39 BST 2014
On Oct 15, 2014, at 8:46 PM, Andrew A. Adams <aaa at meiji.ac.jp> wrote:
> How many scholarly papers are on the Web? At least 114 million, professor
> finds
>
> https://tinyurl.com/kogygol
> The Number of Scholarly Documents on the Public Web
> Madian Khabsa, C. Lee Giles mail
> Published: May 09, 2014
> DOI: 10.1371/journal.pone.0093949
> PLOS OnePaper: https://tinyurl.com/pwefk88
THE SOUND OF ONE HAND CLAPPING
Extremely interesting finding, but the question it raises can be
expressed by the old Maine (sexist) joke, which I will here
present in a gender-neutral way:
Old-Timer #1: “How’s yir spouse?”
Old-Timer #2: “Compayured to wot?”
27M articles are OA out of how many articles published?
(Not out of how many on the web, but out of how many
published? And published when?)
27M is a “dangling numerator.” We need to know the
denominator. (And also what the ratio was last year,
and the year before, so we know how fast it’s growing,
and whether it’s nearer to 10% or 100%.)
114 articles on the web is not the right denominator.
According to Ulrich’s Global Serials Directory http://ulrichsweb.com
there are 105,000 peer-reviewed journals. (I don’t know what
proportion are English-language, nor what proportion are
uncited, but never mind.)
Let us (under)estimate extremely conservatively that on average
they publish at least 15 articles each per year.
That makes at least 1.5M articles published per year (close to the
Bjork et al estimate in made in 2009 http://files.eric.ed.gov/fulltext/EJ837278.pdf )
Now we need to know the date of publication of K & G's 27M OA articles.
And we need to estimate what proportion of the Ulrichs annual 1.5M
articles is among the total 114M articles found on the web, per year or publication.
And then we need to calculate what yearly proportion of that yearly subset
of Ulrichs is among those 27M articles that are OA.
The K & G ratio of 27M/114M = 24% is unfortunately not the
ratio we need, neither for the total ratio nor for the yearly ratio.
The total ratio would be almost meaningless without dates: The total ratio of all
journal articles ever published?
So only annual ratios make sense. But if 1.5M were the annual denominator,
we would then need to know the corresponding annual OA numerator.
In other words, we need an actual Ulrichs sample of the denominator for, say,
each of the last 10 years of publication, and then we need to know what proportion
of those articles are OA, for each year (the numerator).
Unfortunately, Ulrichs indexes only journals, not journal articles. For annual
journal articles one needs to use Thomson-Reuters Web of Science or
SCOPUS (and they only cover about 12% of Ulrichs -- but never mind, it’s
certainly a high-priority subset, and perhaps we can estimate the rest
from further sampling, the way Bjork et al did).
An extremely crude estimate might be derived from K & G's 27M, using 1.5M
as the annual denominator, if we had the publication dates for those 27M.
(Do K & G have those data?) I don’t think 114M is a suitable proxy for that
denominator.
I am sure that K & G’s ingenious method can be used to make estimates
of OA/published ratios by year (and by field). I hope that K & G will
go on to do so. It will be a great help in tracking the growth of OA.
Without at least that it still sounds to my ears like just the sound of one
hand clapping — rather like the download stats that individuals proudly
post in their CVs these days, without providing any norms, reference
points or baselines for comparison. Rather like a pharmaceutical company
that tells you how many patients who took their drug survived (without telling
you how many didn’t, nor how many patients didn’t take their drug, nor what
happened to those patients!).
Stevan Harnad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20141016/152170bc/attachment.html
More information about the GOAL
mailing list