[Koha-devel] externalising translator and etc

Sun Nov 14 22:51:11 CET 2010

Hi,

On Sun, Nov 14, 2010 at 11:48 AM, LAURENT Henri-Damien
<henridamien.laurent at biblibre.com> wrote:
> My idea for that is that tracking addition of zebra indexes and all that
> stuff would be considerably eased if there was an initial etc repository
> that installer clone into /home/koha/kohadev

Since adding a Zebra index is sometimes paired with an update to
template files, splitting the configuration into a separate repository
would actually make it more difficult to track Zebra updates.  It gets
worse if you consider changes such as adding or modifying an
authentication module, for which the default configuration changes are
almost invariably associated with code changes.  What you're proposing
would make it more difficult to cleanly manage Koha development that
touches the default configuration files; among other problems, Git's
submodule support is such that it would require that users and
developers would have to do more than a simple git pull or git
fetch/rebase in order to ensure that they've fetched updates both to
Koha and to its configuration files.

> We create a git repository for all the installations, but when you
> create the repository from the processed files, it looses
> synchronization with all the common indexes which could be required for
> Koha when one adds some other minor feature or fixes a bug in the indexer.

Since it sounds like you're using dev-mode deployments for your
customers, you (i.e., BibLibre, not necessarily you personally) could
just as readily keep track of local customizations in the git clone
for each installation you run, then do a make update_zebra_conf.  I'm
sure that there are a variety of ways to script that, and perhaps some
patches to Makefile.PL would help you better than splitting out etc
into a submodule.

> An other reason is that etc doesnot vary much. But when it varies, when
> you upgrade users should be aware that they might loose their custom
> indexes. I wanted to make next upgrades smoother for libraries.

My suggestion immediately above should help you, but I also want to
point out that make upgrade *does* create backups of any files it
touches, so local changes are preserved that way.

I am not suggesting that there aren't things that could be done to
make things easier for anybody who runs a lot of dev-mode Koha
installations, but sticking etc off into a submodule is not the
solution.

> I am with you when you propose to use different directories. This would
> bring to :
> etc
>  |- koha-conf.xml or koha-conf.yml i.e. ONLY Koha common configuration
> (database access and so on. No more zebra stuff in that. )

I would support that.  It's mostly a historical accident that
koha-conf.xml currently serves as both the main configuration file for
Koha and the top-level config file for zebrasrv, but there's no reason
why those two configuration functions couldn't be placed in separate
files.

>  |- authentication
>  ||- LDAP
>  ||- CAS
>  |-webserver
>  ||- apache2
>  ||- nginx
>  |- searchengine
>  ||-zebradb
>  ||-solr
>  ||-pazpar2

I'm OK with splitting the configuration files.

> But then, when one chooses one type of webserver, one type of
> authentication, one type of searchengine, he would use only a few of all
> the installation files (which could become quite a forest).
> I wanted the structure for etc simpler so that sysadmins would not be
> overwhelmed by big picture.

But all of the options have to exist *somewhere*, and it would be
simpler to manage the development of the various options if the
configuration files and directories were all laid out directly in the
Git repository, not relegated to topic branches.  Furthermore, if all
of the possible configuration files are available in a production
installation, it would be easier for a sysadmin to (say) switch from
Apache to nginx.

In other words, I think it is better to organize the configuration
files well (and document them!) than to effectively atomize the
management of them during Koha development by having permanent topic
branches for various configuration modes.

>> 0 for moving the PO files to a separate Git project.  The size of the
>> repository doesn't really strike me as a big deal; the Git protocol is
>> pretty efficient.  That said, while I don't see a great deal of
>> benefit to splitting the translations off into a separate repository,
>> I don't see much harm either.

As I said, I don't have an strong opinion either way -- that's what
the 0 means -- but I do think there are a couple misconceptions to
clear up:

> Well it is quite striking when you get 249Mo to dl when doing a git
> clone. :)

Not sure where you're getting 249M from -- it's more like 151 MiB when
I measured today.

> (mainly because any change you commit on po files is storing a
> new instance of this file)

Actually, no, it doesn't.  Git is better designed than that; generally
what it would do if you commit a change to a PO file is store just the
delta.  git gc is run on the public repo every week.  When you push or
pull to a repository, Git transfers just compressed deltas.

> ADSL is coping well with that... But there are still some places in the
> world which donot have access to wide bandwidth.

True.  But a Git clone (of the public repo) is a once-and-done
operation.  Anybody installing Koha for production use could use the
tarball or (even better) the Debian package.  Particularly because of
the Debian package, we're getting past the point where dev mode would
be recommend for use by single-library production installations.

I've been doing some measurements.  A PO-only repository would be
about 50M in size, and creating such a thing is the easy part.  But if
we move misc/translator/po to a separate repository, we would have to
also remove that directory from the main repository in order to
realize the repository size savings motivating your proposal - a 'git
rm misc/translator/po' wouldn't reduce the size of the repo.  My test
run is not quite finished yet (it takes a long time for
git-filter-branch to handle almost 13,000 commits), but even assuming
that 50M could be pared from the main repository, actually doing that
would come at a significant cost: every commit would be rewritten by
the git-filter-branch operation.  Rewriting history like that could
mean that every single person who clones against the public repo could
have to deal with forced branch updates, to say nothing of
invalidating all of the release tags.

That prospect doesn't hearten me.  I'll report back once my test finishes.

Regards,

Galen
-- 
Galen Charlton
gmcharlt at gmail.com