<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi, <br>
</p>
<p>I'd like to point the community to a SEO change applied to the
EPrints core. <br>
</p>
<p><a class="moz-txt-link-freetext" href="https://github.com/eprints/eprints/issues/450">https://github.com/eprints/eprints/issues/450</a><br>
---------------------------------------------</p>
<p><b>The problem</b><br>
<br>
A number of repository administrators have noticed that their
content is not featuring in Google Scholar search results.<br>
We have been in discussion with Google Scholar in regard to how it
discovers and indexes the contents of EPrints repositories.<br>
<br>
While EPrints is by design crafted to present its content to
Google in best way, Google Scholar is encountering issues around
the initial discovery of the content.<br>
Google’s crawler processes 100s of billions of links, and it needs
a clearer way to identify that a link is to an EPrints repository
rather than a normal website.<br>
This would then allow Google Scholar to prioritise the crawling
and indexing. Google Scholar already has EPrints specific rules in
its crawler, and they are happy to update them.<br>
</p>
<p><b>The solution</b><br>
<br>
Google Scholar and I have come up with a plan to increase the
discoverability of EPrints content.<br>
<br>
Currently, records on EPrints have URLs which look like<br>
<a class="moz-txt-link-freetext" href="http://YOUR-REPO/EPRINTID/">http://YOUR-REPO/EPRINTID/</a> eg <a class="moz-txt-link-freetext" href="http://irep.ntu.ac.uk/12853/">http://irep.ntu.ac.uk/12853/</a><br>
However this is not easily identified as EPrints content without
visiting the actual page, and Google has a lot of pages to visit.<br>
<br>
We intend to promote the existing EPrints “URI” form of the links,
which are easily identified as being EPrints content.<br>
<a class="moz-txt-link-freetext" href="http://YOUR-REPO/id/eprint/EPRINTID/">http://YOUR-REPO/id/eprint/EPRINTID/</a> eg
<a class="moz-txt-link-freetext" href="http://irep.ntu.ac.uk/id/eprint/12853/">http://irep.ntu.ac.uk/id/eprint/12853/</a><br>
Currently the longer form of the URL redirects to the shorter
version. And we would like to swap that around so that the shorter
redirects the to the longer version.<br>
That way no existing links will stop working, but gradually
references to your repository, and more importantly Google's
indexer will use the longer identifiable version.<br>
<br>
Document URLs would need to be changed in a similar way, again any
existing links would continue to work, but the promoted version of
the links would change from<br>
<a class="moz-txt-link-freetext" href="http://irep.ntu.ac.uk/12853/1/185527_3220%20Heasell%20prepublilsher.pdf">http://irep.ntu.ac.uk/12853/1/185527_3220%20Heasell%20prepublilsher.pdf</a><br>
to<br>
<a class="moz-txt-link-freetext" href="http://irep.ntu.ac.uk/id/eprint/12853/1/185527_3220%20Heasell%20prepublilsher.pdf">http://irep.ntu.ac.uk/id/eprint/12853/1/185527_3220%20Heasell%20prepublilsher.pdf</a><br>
</p>
<p><br>
</p>
<p>We have made the changes described above locally and they have
proved successful.<br>
Now we have now also applied the changes to the EPrints core.<br>
These changes can be enabled by updating your 20_base_urls.pl to
include<br>
$c->{use_long_url_format} = 1;<br>
<br>
If you apply these changes and would like Google Scholar to
prioritise a reindex of your repository, get in touch with us and
we’ll pass the message along to them.</p>
<p><br>
</p>
<p>Justin/Jiadi<br>
</p>
<p><br>
</p>
</body>
</html>