[Koha-devel] Koha Wiki migrated and upgraded

Katrin Fischer katrin.fischer.83 at web.de
Thu Oct 27 20:44:24 CEST 2022


Thanks a lot! :) Looking forward to using the new features and doing
some clean-up work!

On 27.10.22 14:29, Thomas Dukleth wrote:
> [For those not subscribed or giving attention to the Koha mailing list, I
> repeat the announcement here without in message cross-posting.]
>
> The Koha Wiki now running MediaWiki Canasta has been up for a few hours at
> the usual DNS subdomain https://wiki.test.koha-community.org .  The wiki
> is now up to date with MediaWiki 1.35.07 long term stable using a MySQL
> database and ElasticSearch with many fine enhancements such as
> xVisualEditor, customised AdvancedSearch, and dynamic archiving of
> obsolete pages (which often still have useful information).  Please see
> further below for details.
>
> Unfortunately, the mail system on the server for the wiki for resetting
> wiki login passwords, or creating new login users, etc. had previously
> become broken and was missed for fixing amidst all the work which people
> have been doing for releasing a new Koha version.  Someone will send a
> message when the mail system is fixed.  If you know your Koha wiki login
> username and password from previously, they will work.
>
> There are problably many problems with result set relvance for search
> queries within the wiki.  We will fix them over time and relevance ranking
> should automatically improve with use, although, maintenance changes may
> often count as use eroding recently updated relevance.  The wiki is now
> using ElasticSearch which is used by Wikipedia and has better extension
> support than database based indexing.  See some details about the
> customised AdvancedSearch and the need for careful consideration in
> improving search query indexing further below.
>
> Sitemap creation, which assists Google and other web indexing systems in
> indexing the Koha wiki, may not be working correctly in the way in which
> we have configured the MediaWiki Canasta Docker container.  Google and
> others can still index the content without the sitemap but the process
> functions better with a sitemap.  The Canasta Docker container does some
> things differently than the way they would function in a standard
> environment such that less effort should be required for maintenance tasks
> but we need a little more time to examine how some things such as sitemap
> creation are intended to function in Canasta.  We should always be able to
> use methods ordinarily used for sitemap creation in a standard environment
> if necessary.
>
> The Koha MediaWiki Canasta test instance should continue to be available
> for first testing significant changes and bug fixes, at
> https://wiki.test.koha-community.org .  Please do not make wiki
> contributions that you want to save in the MediaWiki Canasta test instance
> as they will not be carried over to the production wiki.  Continue to make
> lasting contributions to the production wiki at
> https://wiki.koha-community.org .
>
> Please read below for an understanding of what to expect before reporting
> issues about which we are already aware, such as the test database is not
> a current copy of the wiki and the mail system for resetting login
> passwords and creating new login users is not working.
>
> You may report bugs to the bug "wiki needs updating to a later version",
> https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=23073 .
>
> WIKI DATABASE MIGRATION, UPGRADE, AND CONTAINER MANAGEMENT.
>
> Migrating the Koha MediaWiki database from Postgres to MySQL and upgrading
> to MediaWiki 1.35.07, the current long term stable version used a
> repeatable process managed with a set of scripts which I developed in
> bash, Perl, and Python as appropriate for the task and previous code in
> the case of Python.  Choosing Postgres as the database for a test instance
> for MediaWiki had left us with a mistake in database choice complicating
> compatibility and future upgrades when the MediaWiki test suddenly became
> the only Koha wiki running when the previous Koha wiki went down in the
> midst of a community schism with LibLime long ago.  The database migration
> and upgrade process has been developed and progressively tested over the
> course over time from 2019 for ensuring that the database is migrated
> correctly, etc.  I built the database migration process upon the
> originally incomplete and sometimes mistaken Python script of Philipp
> Spitzer which was a fantastic proven starting point without which the task
> may have been some degree too much.
>
> Mason James ran a web crawl and diff test to verify that the production
> and another test database migration and upgrade of the wiki had the same
> content except for evident changes where the production wiki had been
> updated with new content.
>
> The database was imported to MediaWiki Canasta which Tomás Cohen Arazi
> identified and customised to connect to the Koha Portainer Docker
> container management to provide MediaWiki in a Docker container with a
> large set of important extensions to help make managing the MediaWiki
> software easier.  See https://github.com/CanastaWiki for more about
> MediaWiki Canasta.
>
> After a minimial final testing period with Canasta in Koha Portainer, I
> marked the old wiki instance which had been using Postgres readonly and
> proceded with the database migration and upgrade using an up to date copy
> of the database.  Tomás made further modifications in Portainer and Chris
> Cormack redirected the DNS record for wiki.koha-community.org DNS to the
> current server for the wiki.
>
> Although you may never notice an interruption in service for the wiki.  We
> may have to restart it to fix things which function a little differently
> and a little more complicated to fix for the MediaWiki Canasta Docker
> container than a standard operating system environment.  Once fixed,
> maintenance of a Docker container should be easier than a standard
> environment.  We may even move the server or some functions back to the
> server where the wiki had been hosted for years thanks to Galen Charleton
> and Equinox.
>
> See further below for a little about other modifications which I made to
> support dynamic archiving, etc.
>
> KNOWN ISSUES.
>
> The mail system on the server for the wiki for resetting wiki login
> passwords, or creating new logins, etc. had previously become broken and
> needs fixing as a matter of priority.
>
> The test instance at https://wiki.test.koha-community.org has copy of the
> database which will become outdated over time.  In future, we may set up a
> process to update the database periodically.  The purpose of the test
> instance is to support testing significant wiki changes or wiki bug fixes
> first without the hazard of harming the production wiki.  Bug fixes need
> testing and can at least temporarily break the wiki.  Significant changes
> may fail to work as expected and might not be easily undone particularly
> if the changes have been created by a script for mass editing.  Having the
> latest wiki revisions is not usually needed for testing.
>
> There have been bugs specific to MediaWiki Canasta rearranging some
> standard files for the Docker container which have been addressed.
> However, there are at least some Canasta Docker specific bugs relating to
> the Docker container environment.  Please report any instances of "Error
> creating thumbnail: Unable to save thumbnail to destination" which I found
> in the Koha History page https://wiki.koha-community.org/wiki/History .
> Instances of the bug can be fixed with command line shell access by
> removing the images/thumb/$buggy_image_name subdirectory for the image and
> making non-changing edit to the page which allows MediaWiki to recreate
> the images/thumb subdirectory without a problem and the bug goes away.  We
> should probably remove all the subdirectories in the images/thumb
> directory proactively.  Yet, why is there a special problem for the Docker
> container which does not exist in other test instances when not using a
> Docker container and the container environment is running as the root user
> as standard for Docker containers and the root user should have all the
> permission necessary to access or create a thumbnail directory?  Changing
> the ownership of the images directory and subdirectories back and forth to
> test the effect temporarily broke a test instance of the wiki until the
> container was restarted.
>
> MAJOR ENHANCEMENTS FROM UPGRADING.
>
> The VisualEditor extension used by Wikipedia is a WYSIWIG and guided forms
> aid for visually editing the underlying wikitext for a page and using
> guided forms for adding some features to a page.  Users can switch back
> and forth between source editing in all wikitext syntax and VisualEditor,
> however, it may be best to save the current edit before switching back and
> forth to avoid problems of imperfect correspondence between wikitext
> syntax and the VisualEditor model of wikitext.
>
> The AdvancedSearch extension used by Wikipedia is helpful for a user
> friendly interface to construct search queries and modify them by removing
> terms which appear in a bubble with an [x] to remove the term.
> AdvancedSearch depends on ElasticSearch which performs remarkably well in
> testing and allows the wiki to be reindexed in a couple of minutes if
> necessary.  See further below for modifications to the AdvancedSearch
> extension.
>
> SemanticMediaWiki was reinstalled after copying the upgraded database.
> Modifying the AdvancedSearch extension in conjunction with special
> AdvancedSearch navigation links and custom queries using carefully managed
> standard wiki categories may be more helpful than SemanticMediaWiki.
> Furthermore, anyone experimenting with SemanticMediaWiki should be aware
> that verbose syntax is required to avoid breaking most wikis with
> SemanticMediaWiki after forthcoming MediaWiki updates in which a hook
> commonly relied upon for SemanticMediaWiki which has been deprecated will
> be removed.  Wikipedia does not use SemanticMediaWiki and thus some
> MediaWiki developers may not have given sufficient consideration to
> managing the issue.  The workaround may involve a potential performance
> deficit when using SemanticMediaWiki search queries.
>
> The MassEditRegex extension has power one might hope for in the name for
> using regular expressions to modify a list of pages.  However, given its
> power it remains commented out in LocalSettings.php for the production
> system.  Use is intended to be for some special group of users such as
> wiki administrors, however even they should be most strongly cautioned to
> first test their process on a test instance of the wiki.  Furthermore, use
> should be with a bot subaccount set up by the user so that they may be
> identified as the work of a bot process and those mass changes may avoid
> adversely affecting page modification priorities in search result sets.
> The creation of user bot subaccounts should be documented.  In testing,
> MassEditRegex works fantastically well for adding categories to the bottom
> of pages and templates to the top of pages which can be done without risk
> of an inadequately debugged regular expression breaking page content in
> the middle.
>
> MODIFIED FEATURES.
>
> I modified the following to support dynamic archiving in which obsolete
> content does not appear by default for search results unless the user goes
> directly to the advanced search page without following provided navigation
> links or changes the default VectorMod skin affecting the basic search
> box.
>
> ADVANCEDSEARCH EXTENSION WITH MODIFICATIONS.
>
> The AdvancedSearch extension has been modified to include two additional
> form elements: one for excluding particular categories and another for
> excluding particular templates.  These additional elements appear in the
> user friendly AdvancedSearch term bubbles which can be individually
> removed from a query by clicking on the [x] for the particular bubble.
>
> Editing the non-English localisation files is still pending.  For
> languages for which a non-English localisation file has not been edited,
> the custom fields for category and template exclusion display a
> description in English.
>
> DeepCategory searches for subcategories of a category is disabled because
> it requires a sparkle database and is only updated on a weekly basis for
> Wikipedia.  Searching subcategories of a category should be less of an
> issue with faceted use of categories which we should be carefully moving
> towards.
>
> Excluding particular categories supports dynamic archiving by supporting
> search queries excluding obsolete pages with -incategory:"Obsolete", which
> is automatically invoked from the navigation link "Advanced Search
> current" or from simple search box when using the modified Vector skin,
> VectorMod.  Obsolete pages are also noted with a prominent notice using
> the Obsolete template.  Such pages should be updated if they can be, but
> are otherwise available to consult most importantly for valuable
> information they often contain which is not yet present in current pages.
> Archived obsolete pages can be found exclusively by following the
> navigation link
> "Advanced search obsolete archive" which includes incategory:"Obsolete"
> automatically.
>
> The result set for search queries with incategory:"Obsolete" can be used
> to identify the type of pages which should have the Obsolete category and
> Obsolete template but do not yet, such as installation information for
> some particular old Debian versions.  Various combinations of including
> and excluding categories and templates can be easily used in the modified
> AdvancedSearch to find pages which only have one of either the Obsolete
> category or Obsolete template which should be used together or both
> removed if the page has been updated to be current.
>
> All wiki pages should have some category even if it may be
> [[Category:Empty]] for people uncertain of what may be appropriate in the
> moment.  Pages missing categories may not be disappearing from query
> results by category when using ElasticSearch indexing as they had been
> when using database based search indexing.  We can also query for pages
> missing categories using
> https://wiki.koha-community.org/w/index.php?title=Special:UncategorizedPages
> and correct the issue which has been neglected due to loss of time where
> migrating and upgrading the wiki has been the priority with much less time
> available otherwise especially since the pandemic.
>
> We should take some care when thinking about faceted category use as no
> wiki software uses fielded categories.  Thus there may be no concise way
> to query for pages which address a topic in a general way or supplement
> other documentation on a topic containing a lone category such as
> [[Category:Circulation]], if we then have many other pages with
> [[Category:RFCs]] and [[Category:Circulation]] but no longer
> [[Category:Circulation RFCs]] as a possible change for faceting.  In such
> an example, the search results of a query for incategory:"Circulation"
> might have a result set in which pages for RFCs relating to circulation
> issues containing both [[Category:RFCs]] and [[Category:Circulation]]
> might crowd out more generally helpful pages with [[Category:Circulation]]
> alone.  The problem may indicate a need for a navigation link to exclude
> RFCs from a search query; designating old RFCs as obsolete; or both.
> Alternatively or additionally, we may be able to adjust the weighting of
> the ElasticSearch indexing options such that pages containing
> [[Category:RFCs]] have a lower weight and appear further down the result
> set or pages with a single category such as [[Category:Circulation]] alone
> or some particular additional categories such as
> [[Category:Documentation]] have higher weight and appear further up the
> result set.
>
> VECTORMOD SKIN.
>
> Users are free to choose their own preferred MediaWiki skin and we can add
> others.  VectorMod is merely set as the default to help people avoid
> obsolete pages when submitting search queries from the simple search box
> which appears on every page.
>
> VectorMod is a custom version of the Vector skin which includes a modified
> version of Vector/includes/templates/SearchBox.mustache supporting dynamic
> archiving of obsolete content by excluding pages which have been
> designated obsolete by automatically adding -inCategory:"Obsolete" to
> basic search querries.  The syntax incategory requires using
> ElasticSearch.  Previously, I replaced the SearchBox.mustache file in the
> Vector skin
> directly, which certainly worked without the extra effort of creating a
> custom skin.
>
> Automatically inserting -inCategory:"Obsolete" in the basic search box is
> now somewhat elegant in conjunction with the modified AvancedSearch
> extension as it uses explanatory language labels with a bubble which has a
> removal [x] and allows autocompletion of query terms.
>
> Significant renaming of references to Vector as VectorMod and vector as
> vectormod has been scripted allows both Vector and VectorMod to be loaded
> and available to users.
>
>
> Thomas Dukleth
> Agogme
> 109 E 9th Street, 3D
> New York, NY  10003
> USA
> http://www.agogme.com
> +1 212-674-3783
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : https://www.koha-community.org/
> git : https://git.koha-community.org/
> bugs : https://bugs.koha-community.org/


More information about the Koha-devel mailing list