[Koha-devel] Strange characters

dcook at prosentient.com.au dcook at prosentient.com.au
Thu Dec 17 03:30:33 CET 2020


So EF BF BD is the box with a question mark �. 

If you look at commit 100e6a9808ead4ee8d951da59ead1550e75bb4c3, you'll see the following:

-# Casta361eda, Carlos Sebastian - seba3c at yahoo.com.ar - Physics Library UNLP Argentina
+# Casta�eda, Carlos Sebastian - seba3c at yahoo.com.ar - Physics Library UNLP Argentina

361 is the octal for ñ. 

I wrote this Perl oneliner to find these:

find . -not -path '*/\.git/*' -type f -exec perl -lne 'print $ARGV if /\xef\xbf\xbd/' {} \; | sort -u

I found the following in 19.11:

./koha-tmpl/intranet-tmpl/lib/datatables/datatables.js
	Seemingly intentional...
./koha-tmpl/intranet-tmpl/lib/datatables/datatables.min.js
	Seemingly intentional
./koha-tmpl/intranet-tmpl/lib/yui/plugins/loading-min.js
	Looks like a typo in a name like koha-news.pl
./misc/migration_tools/buildEDITORS.pl
	Lots in a commented out section which should probably just be deleted...
./misc/release_notes/release_notes_19_11_02.html
	Input error in an organisation name?
./misc/release_notes/release_notes_19_11_02.md
	Input error in an organisation name?
./t/db_dependent/data/marc21/zebraexport/biblio/exported_records
	In record data?
./tools/koha-news.pl
	You already know this one

Using vim, you can just search for �. 

David Cook
Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia

Office: 02 9212 0899
Online: 02 8005 0595

-----Original Message-----
From: Koha-devel <koha-devel-bounces at lists.koha-community.org> On Behalf Of Didier Gautheron
Sent: Thursday, 17 December 2020 3:03 AM
To: koha-devel <koha-devel at lists.koha-community.org>
Subject: Re: [Koha-devel] Strange characters

Hi,

16 décembre 2020 16:23 "Fridolin SOMERS" <fridolin.somers at biblibre.com> a écrit:

> Hi,
> 
> I found some strange characters in sources :
> 
> https://git.koha-community.org/Koha-community/Koha/src/branch/master/t
> ools/koha-news.pl#L7
> 
> It se a <?> :
> Casta?eda, Carlos Sebastian
> 
> Do you see that ?

It seems to be a valid UTF8:
ef bf bd
Character name REPLACEMENT CHARACTER
Likely from an old window file: ñ being the culprit.

> Is this non-UTF8 ?
> Can we build a command to find them all ?
> I've tried with 'grep -P' but impossible.
git grep  
find them, with false positive, or using iconv?
iconv -f utf8 -t utf8
should complain if there's invalid sequences
eg:
LANG=C iconv -f utf8 -t utf8 ./misc/cronjobs/automatic_renewals.pl > /dev/null
iconv: illegal input sequence at position 81 _______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/




More information about the Koha-devel mailing list