[Koha-bugs] [Bug 35812] New: Should specify canonical URLs to help search indexers

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Jan 15 10:53:07 CET 2024


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=35812

            Bug ID: 35812
           Summary: Should specify canonical URLs to help search indexers
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P5 - low
         Component: OPAC
          Assignee: oleonard at myacpl.org
          Reporter: schodkowy.omegi-0r at icloud.com
        QA Contact: testopia at bugs.koha-community.org

So we've recently added our Koha site to Google Search Console and became
bombarded by its complains about lack of defined canonical URLs for pages. This
is mostly a problem for bib details page.

For main page, you can have three possible URLs that will all load main page:
- /
- /index.html
- /cgi-bin/koha/opac-main.pl

I think "/" should be specified as canonical URL, which could even "lazily" be
achieved by following Apache directives (these will append the Link header to
all three URLs above and nothing else) (they could be added to
debian/templates/apache-shared-opac.conf):

Header always append "Link" "</>; rel=canonical" "expr=%{REQUEST_URI} ==
'/index.html'"
Header always append "Link" "</>; rel=canonical" "expr=%{REQUEST_URI} ==
'/cgi-bin/koha/opac-main.pl'"

The situation gets a bit more complicated with bibs, because trying to solve it
with Apache directives becomes complex if even possible, instead of another
solution (note that we can also have a HTML tag instead of HTTP header).
Koha links to pages with URLs in the format
`/cgi-bin/koha/opac-detail.pl?biblionumber=1234`, but the Apache config has
defined nice links in the format of `/bib/1234`. With `opac-detail.pl` page you
get some junky parameters appeared after you reach the page from Koha search
such as:
- /cgi-bin/koha/opac-detail.pl?biblionumber=1234
- /cgi-bin/koha/opac-detail.pl?biblionumber=1234&query_desc=se,phr:"Something"
- /cgi-bin/koha/opac-detail.pl?biblionumber=1234&query_desc=an:3200
- /cgi-bin/koha/opac-detail.pl?biblionumber=1234&query_desc=an:1846 and
su-to:Something and itype:BK and su-to:Environment and su-to:Environment
-
/cgi-bin/koha/opac-detail.pl?biblionumber=1234&fbclid=FSD8ufs98jf39jfes80jfds8jfsd
- /cgi-bin/koha/opac-detail.pl?bib=1234

To top it off, if you do a search and there's only one search result, you get
redirected directly to opac-detail, confusing Google even further (I actually
think it's worthy of a separate bug to add some kind of notice that you were
redirected cause there was only one result, but that's a matter for another
discussion...).

So the solution there would also be to specify the canonical link as
"/bib/1234" with no params, but I think it might be best off done in code
rather than Apache config there.

The end result of these would be that search results would have nice links,
without weird params such as query_desc added to the indexed pages, which is
awful (as the user coming from search has nothing to do with that query as when
it was indexed).

This would solve also another potential problem, in that such silent redirect
from search results could make a search engine think that the search URL is the
canonical URL (as when it's not specified, it's up to the search engine to
detect duplicate and decide, but it can do so wrongly and against its own rules
sometimes, as we've observed).

And finally, the search URLs could also be made canonical from
"/cgi-bin/koha/opac-search.pl" to "/search", however this time with keeping all
URL query parameters.

All of this is somewhat related to Bug 18410, but not blocked by it by any
means, as the URL aliases I've described above are actually in the Apache
config already for a long time, and don't depend on overall Koha routing
overhaul.

Btw: it's better to use full OPAC base URL instead of just "/" if we can, this
will help with deduplicating indexed URLs in case your Koha is available under
multiple aliased domains or is under process of changing domains (or is
technically available under both http and https without a redirect, and search
treats those as separate sites!). But doing so requires pure code solution
instead of Apache directives.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list