[Koha-bugs] [Bug 22223] Item url double-encode when parameter is an encoded URL

Thu Jul 30 08:07:45 CEST 2020

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=22223

--- Comment #21 from David Cook <dcook at prosentient.com.au> ---
Actually, I'm going back to thinking that the "url" filter should be removed in
this case.

The 2005 URI standard says at https://tools.ietf.org/html/std66#section-2.4
that "the only time when octets within a URI are percent-encoded is during the
process of producing the URI from its component parts.  This is when an
implementation determines which of the reserved characters are to be used as
subcomponent delimiters and which can be safely used as data.  Once produced, a
URI is always in its percent-encoded form."

This is the only safe time to do the uri encoding.

And if you look at the "uri" filter for Template Toolkit at
http://www.template-toolkit.org/docs/manual/Filters.html#section_uri, that's
how they use the do percent-encoding for URIs. That is, building a URI from its
component parts and doing the escaping at those points.

The "url" filter for encoding whole URLs in Template Toolkit is highly
problematic. I can certainly get the appeal. After all, say someone submits a
URL to a web form and you want to show them their URL on the response page.
Technically speaking, a person should decompose the URL, and then rebuild it
from its component parts. The "url" filter is a convenient mechanism, but it
seems technically incorrect.

So we maybe shouldn't use the "url" filter... but we need to do *something*. 

The 2005 URI standard is dogmatic. Practically speaking, Koha is given whole
URLs by library staff members. It's not building URIs itself from component
parts. 

In theory, the library staff members should be passing in URLs that are already
encoded, but in practice that is unlikely to happen, unless they're
copying/pasting from somewhere else, and even then it may be hit or miss.

In theory, we shouldn't be encoding the URL at the template level since it
should already be encoded when it was created... but as above... we can't trust
that. 

Perhaps we should implement our own filter that first parses the URI and then
decodes its component parts before re-encoding its component parts. 

Of course, https://tools.ietf.org/html/std66#section-2.4 also says
"Implementations must not percent-encode or decode the same string more than
once, as decoding an already decoded string might lead to misinterpreting a
percent data octet as the beginning of a percent-encoding, or vice versa in the
case of percent-encoding an already percent-encoded string."

So... technically speaking this is kind of unsolvable in terms of strict
adherence to the standard? 

The problem being of course the human element. If we were mechanically building
URLs, encoding them, sending them, decoding them, and using them... it would
all be fine. 

The problem is human input. 

With the OPAC, we might accept a URL but it would just be as text data. But
with the staff interface, we're actually using it in HTML... 

The most practical option in my mind is to just not use the "url" filter on
this field, because I think that we sort of have to assume that the librarian
has put in a properly encoded URL in the first place.

-- 
You are receiving this mail because:
You are watching all bug changes.