[Koha-bugs] [Bug 34549] The cataloguing editor allows you to input invalid data

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Wed Nov 1 02:27:14 CET 2023


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=34549

--- Comment #20 from David Cook <dcook at prosentient.com.au> ---
(In reply to Martin Renvoize from comment #15)
> Hmm,  whilst this certainly resolves the core issue.. I'd have loved to have
> seen some form of warning to the end user that their input data has been
> manipulated.
> 
> I'm not close enough to the differences between MARC-8 and UTF-8 encodings
> to know exactly what we're losing during the save.. the test case
> highlighted here is simple.. just dropping a hidden character.. no harm
> done.. however, might there be cases where the mis-encoded string getting
> stripped would result in worse data from the human perspective?  It would be
> good to somehow catch these sorts of misconfigurations and try to encourage
> end users to fix them.

I've been looking at this further and it looks like it's actually harder to get
a badly encoded record into Koha than I thought!

If I try to stage a Latin-1 record as a UTF-8 record, it'll fail. At the
moment, the background job is failing silently, but after adding some debugging
I saw the message is "Input is not proper UTF-8".

So I think maybe some of the encoding issues I've seen have to do with
side-loaded records that have been directly put into the database as part of a
data migration.

--

Also, as per my comments on bug 35104, it looks like Microsoft Edge has a
tendency of corrupting data (at least in PDFs) and then users paste in
corrupted data which includes control characters. 

This change would work well to erase those non-printable control characters
though. I suppose there could be minor data loss, although it's due to the
source data being a problem...

I'm trying to gather scenarios for bad data so that we can alert on them
well...

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list