[Koha-devel] Solr Search use cases

Wed Jan 5 22:01:06 CET 2011

Working in consultation with Claire Hernandez and Henri-Damien Laurent
from BibLibre, I have replaced the test queries use cases Google Docs
spreadsheet with tables in the wiki.

Claire said that she would provide redirection links from the Google Docs
spreadsheet for those missing this announcement.

1.  EASE OF EDITING TABLES IN WIKI SYNTAX.

There can be difficulties using wiki syntax to edit tables.  The problem
is mostly related to losing track of the column while editing content in a
row.  That problem is corrected by editing each cell of the table on its
own line in wiki syntax as recommended on the MediaWiki and Wikipedia help
pages for editing tables.

The simplest MediaWiki table illustrates.

{| border="1"
| Column 1 Heading
| Column 2 Heading
| Column 3 Heading
|-
|row 1 column 1 content
|row 1 column 2 content
|row 1 column 3 content
|-
|row 2 column 1 content
|row 2 column 2 content
|row 2 column 3 content
|}

Empty cells are fine.  Line breaks are generally ignored in wiki syntax,
therefore, the separate lines for individual cells in a row are
transformed into the correct HTML for a table row.

If editing a table in wiki syntax is easy enough, then the main reason for
using a Google Docs spreadsheet for adding test queries is gone.  Use of
non-free Google Docs software with is thus avoided.

2.  LINKS.

The main page for test queries is at
http://wiki.koha-community.org/wiki/Solr/Lucene_Test_Queries_for_Koha . 
Pages which actually contain the tests are merely linked from that page
and from the Switch to Solr RFC for Solr/Lucene queries.

Solr/Lucene query syntax documentation and BibLibre work in progress
demonstrations available for testing are linked from
http://wiki.koha-community.org/wiki/Solr/Lucene_Documentation .

Test query pages which have been created include the following.

The main page for Solr/Lucene test queries is at
http://wiki.koha-community.org/wiki/Solr/Lucene_Test_Queries_for_Koha .

Simple user test queries are at
http://wiki.koha-community.org/wiki/Simple_User_Test_Queries_for_Koha_with_Solr/Lucene
.  Simple user test queries are not expressed in Solr/Lucene query syntax/
 See the wiki page with a couple of test cases which I created for
explanation.

SimpleServer Z39.50/SRU test queries are at
http://wiki.koha-community.org/wiki/SimpleServer_as_a_Z39.50/SRU_Server_Test_Queries_for_Koha_with_Solr/Lucene
.

I created links from the main test queries page for test query pages for
other options including OPAC metasearch with Solr/Lucene, nozebra, Zebra,
etc.  I do not have a near term intention to populate such pages myself,
however, anyone interested may start them.

3.  ORGANISING TEST QUERIES.

After much work examining the existing the existing table of test queries
and thinking about what would be needed for the most demanding use cases I
found that the main table needed subdivision to help ensure that we have
some tests which properly isolate particular functionality.  Tests also
need to combine functionality and almost every test inevitably combines
some functionality.

The test have been somewhat organised in one table but I found it
difficult to determine at a glance what had and had not been tested.  Even
with such a small number of test cases, the organising principle which may
have been used when starting seemed to be drifting as is inevitable for
all organisational efforts.  Good organisation merely provides
functionally useful organising principles and constrains the rate of
organisational drift.

3.1.  TEST CASE COMPLETENESS.

The easiest remedy to ensuring that we do not miss testing important
functionality is to define the functionality in advance in a taxonomy of
possible functionality.  I used the most complete existing taxonomy of
query functionality as defined in Z39.50/SRU as a model.  I did not
include all the details but enough basic categories to organise what we
have and guide us to avoid missing tests for important functionality.  No
matter that no record indexing system system will support every type of
query possible.  Knowing what can be implemented and testing for that is
important.

Having conducted the exercise of categorising the tests, I note that we
have no tests for some basic features.  The categories which I defined are
not detailed enough to list everything within particular tests.

3.2.  REDUCING ORGANISATIONAL DRIFT.

Separating the organisational parts into separate tables should help
reduce the rate of drift into disorganisation.  I have included an
unclassified table as place to put queries where there is uncertainty
about an appropriate place to include a new test query.

Most importantly I expect the number of tests to grow greatly over time
for particular functionality which may be difficult to implement without
bugs or for which it is difficult to construct a single comprehensive
test.  The bugs or incompleteness may be in underlying dependencies and
not in any aspect of the Koha implementation.

I certainly do not expect any stemming automation to work perfectly, for
example, and I imagine accumulating examples from several different
languages where we need to know the limits of functionality.  We may best
identify such issues upstream and pass our findings upstream but we should
be able to organise the problem ourselves.

Growth in what needs organising poses challenges maintaining a
functionally useful order.  If we come to have hundreds of tests for
stemming in different languages, for example, we could more them to a
different page and link to upstream treatment.

3.3.  ORGANISATIONAL COMPLETENESS.

Only the main Solr/Lucene test queries page has a high degree of
organisational completeness in the taxonomy of possible queries at mostly
consistent level of detail.

Much remains to be added in a similar manner to the page for SimpleServer
Z39.50/SRU test queries.  However, supporting many query options for a
Z39.50/SRU server for the world to use is much less important than
supporting every possible query option for the OPAC which has far greater
use.

3.4.  UNCLASSIFIED TABLE.

We do not want to loose any test queries which people might have to
hesitancy in relation to the organisation modeled on Z39.50/SRU
organisation.  An unclassified test queries table is provided at the end
of each page for those who have trouble classifying their test queries and
no time to improve the clarity of the organisation.

4.  ADDITIONAL TABLE FORMATTING.

The formatting of the tables in the wiki could be improved but I may not
have time for a while with my other commitments.

Tables could have actual titles instead of relying upon wiki section
headings.

Columns might be more uniform in width from one table to another instead
of varying by the content of a particular table.

Colour coding could distinguish work.  Rows for which the feature test has
been developed could appear in green for example.  [There might also be a
disadvantage to colour coding that people adding new rows to the table and
not observing how the colour coding is functioning might inadvertently
copy colour coding which does not apply to a newly added row.  An inline
comment of do not copy this bit comment if you are adding a new row might
help.]

Additional columns could be added if we need them in future such as
priority and links to particular bug reports.

Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
+1 212-674-3783

On Mon, December 27, 2010 19:05, Thomas Dukleth wrote:
> Reply inline:
>
>
> On Thu, December 23, 2010 13:41, LAURENT Henri-Damien wrote:
>> Le 23/12/2010 14:35, LAURENT Henri-Damien a écrit :
>>> Hi,
>>> we proposed in our solr meeting to gather the use cases that you would
>>> like to add and proove with the solr search.
>>> We wondered why nobody had added any yet....
>>> And we realized that the document, though public, would not be
>>> editable.
>>> This has been fixed now.
>>> So all the persons who were longing to add usecases are now able to do
>>> so. The link is the same as previously publicised.
>>> https://spreadsheets.google.com/pub?key=0AuZF5Y_c4pIxdEVzTjUtUGFoQnFpSkpfbTU5Ykc3b2c&hl=en&output=html
>>>
>>> Please, enter whatever query you have success or problem with on zebra.
>>> We will try and do it with solr and then build the user interface for
>>> it.
>>> Hope that helps.
>> Still failing to edit.
>> This latest one should do :
>> http://tinyurl.com/3ydnuhw
>> If anyone wants to test and tell me if it is ok now.
>
> Yesterday, I had only an error pop-up dialog box at one minute or several
> second intervals in multiple web browsers.
>
> Today, I had no edit options for the spreadsheet.
>
> I added an alternative use cases table to the BibLibre Solr RFC,
> http://wiki.koha-community.org/wiki/Switch_to_Solr_RFC#How_To_Help , for
> those having difficulty editing the Google Docs spreadsheet.
>

[...]