<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 24, 2017 at 12:19 PM, Heather Morrison <span dir="ltr"><<a href="mailto:Heather.Morrison@uottawa.ca" target="_blank">Heather.Morrison@uottawa.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
hi Peter,
<div><br>
</div>
<div>If many knowledge projects are advancing our knowledge through the means that you have described, surely there are others than the one you started yesterday? Can you provide a list or literature review of such studies?</div></div></blockquote><div><br></div><div>There are literally thousands. In biomedicine alone there are many conferences and competitions. An overview is given in <a href="https://en.wikipedia.org/wiki/Biomedical_text_mining">https://en.wikipedia.org/wiki/Biomedical_text_mining</a> .<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<div><br>
</div>
<div>My OA APC study uses data from different sources that do not have a common set of terms: </div>
<div><a href="http://dataverse.scholarsportal.info/dataverse" target="_blank">dataverse.scholarsportal.info/<wbr>dataverse</a></div>
<div><br>
</div>I would like to note some methodological concerns with such the approach described by PMC </div></blockquote><div><br></div><div>I assume you mean me, PMR, Not (Europe)PubMedCentral.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>(automatically gathering data from tables).Taking data from different studies without fully accounting for difference in methods (eg definition or measurement) could
easily lead to false conclusions. Worse, such false conclusions would be highly replicable leading to false confidence in results, ie anyone could repeat the same mistakes and come to the same conclusion of unknown external validity.
</div></blockquote><div><br></div><div>It is very sad to be severely criticised by a scholar who has not read my work, proposal, and website and does not understand what I am doing. There are many cases where the data format I extract from allows precise metrics on recall and precision of the character stream (in the current case I expect >> 99%). You do not know my purpose - which you describe as "false conclusions". In fact the output will be routed to expert human reviewers and will save 90% of their time. <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br>
</div>
<div>For the 2016/17 OA APC dataset I am adding a "providence" </div></div></blockquote><div><br></div><div>I assume you mean "provenance"<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>column because the data in the 2016 APC column comes from different researchers with some differences in data collection. Even in a single dataset, to analyze one needs to understand when you are
comparing apples with apples or macintoshes with Spartans. Automating data analysis without full comprehension of the data strikes me as problematic.</div></div></blockquote><div><br></div><div>This assertion that I do not have full comprehension and that my work is problematic is unworthy. I have pioneered automatic extraction of chemistry and of crystallography over 40 years and have been honoured by scientific societies for doing so. I have defined the data extraction process, shown how it can be aggregated, provided metrics and pioneered technology that has led to several thousand papers (by people who have built on my work). <br><br></div><div>Peter<br></div><div> <br></div><br clear="all"></div><br>-- <br><div class="gmail_signature"><div dir="ltr"><div>Peter Murray-Rust<br>Reader Emeritus in Molecular Informatics<br>Unilever Centre, Dept. Of Chemistry<br>University of Cambridge<br>CB2 1EW, UK<br>+44-1223-763069</div></div></div>
</div></div>