[GOAL] How much of the content in open repositories is able to meet the definition of open access?
Peter Murray-Rust
pm286 at cam.ac.uk
Tue Jan 24 13:31:18 GMT 2017
On Tue, Jan 24, 2017 at 12:19 PM, Heather Morrison <
Heather.Morrison at uottawa.ca> wrote:
> hi Peter,
>
> If many knowledge projects are advancing our knowledge through the means
> that you have described, surely there are others than the one you started
> yesterday? Can you provide a list or literature review of such studies?
>
There are literally thousands. In biomedicine alone there are many
conferences and competitions. An overview is given in
https://en.wikipedia.org/wiki/Biomedical_text_mining .
>
> My OA APC study uses data from different sources that do not have a common
> set of terms:
> dataverse.scholarsportal.info/dataverse
>
> I would like to note some methodological concerns with such the approach
> described by PMC
>
I assume you mean me, PMR, Not (Europe)PubMedCentral.
> (automatically gathering data from tables).Taking data from different
> studies without fully accounting for difference in methods (eg definition
> or measurement) could easily lead to false conclusions. Worse, such false
> conclusions would be highly replicable leading to false confidence in
> results, ie anyone could repeat the same mistakes and come to the same
> conclusion of unknown external validity.
>
It is very sad to be severely criticised by a scholar who has not read my
work, proposal, and website and does not understand what I am doing. There
are many cases where the data format I extract from allows precise metrics
on recall and precision of the character stream (in the current case I
expect >> 99%). You do not know my purpose - which you describe as "false
conclusions". In fact the output will be routed to expert human reviewers
and will save 90% of their time.
>
> For the 2016/17 OA APC dataset I am adding a "providence"
>
I assume you mean "provenance"
> column because the data in the 2016 APC column comes from different
> researchers with some differences in data collection. Even in a single
> dataset, to analyze one needs to understand when you are comparing apples
> with apples or macintoshes with Spartans. Automating data analysis without
> full comprehension of the data strikes me as problematic.
>
This assertion that I do not have full comprehension and that my work is
problematic is unworthy. I have pioneered automatic extraction of chemistry
and of crystallography over 40 years and have been honoured by scientific
societies for doing so. I have defined the data extraction process, shown
how it can be aggregated, provided metrics and pioneered technology that
has led to several thousand papers (by people who have built on my work).
Peter
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20170124/5e91d3c7/attachment-0001.html
More information about the GOAL
mailing list