[Koha-devel] How can I improve this report?

dcook at prosentient.com.au dcook at prosentient.com.au
Mon Jul 4 02:07:27 CEST 2022


Hey Doug,

 

In terms of analyzing queries, I find MySQL/MariaDB to be far more opaque than PostgreSQL, but in theory the “EXPLAIN” and “ANALYZE” keywords can help you out with this one[1]. 

 

Overall, I’d say it depends on what you’re trying to do and your data set. I have databases with hundreds of thousands of authority records, and for those I’d probably write a custom script to query Zebra since it’s already indexed the data. 

 

--

[1]

Using koha-testing-docker, here is the query you provided:

 

MariaDB [koha_kohadev]> ANALYZE SELECT DISTINCT(author) AS heading FROM biblio WHERE author NOT IN (SELECT ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code="a"]') AS heading FROM auth_header WHERE authtypecode='PERSO_NAME') O

RDER BY heading;

+------+--------------------+-------------+------+---------------+------+---------+------+------+--------+----------+------------+----------------------------------------------+

| id   | select_type        | table       | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                                        |

+------+--------------------+-------------+------+---------------+------+---------+------+------+--------+----------+------------+----------------------------------------------+

|    1 | PRIMARY            | biblio      | ALL  | NULL          | NULL | NULL    | NULL | 436  | 436.00 |   100.00 |      16.28 | Using where; Using temporary; Using filesort |

|    2 | DEPENDENT SUBQUERY | auth_header | ALL  | NULL          | NULL | NULL    | NULL | 1560 | 653.58 |   100.00 |       0.13 | Using where                                  |

+------+--------------------+-------------+------+---------------+------+---------+------+------+--------+----------+------------+----------------------------------------------+

2 rows in set (1.380 sec)

 

Here is another one I cooked up which apparently is marginally faster:

 

MariaDB [koha_kohadev]> ANALYZE SELECT DISTINCT(biblio.author) AS heading, derived.ah FROM biblio  LEFT JOIN (SELECT ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code="a"]') as ah FROM auth_header WHERE authtypecode='PERSO_NAM

E') AS derived ON biblio.author = derived.ah WHERE derived.ah is null and biblio.author is not null ORDER BY biblio.author;

+------+-------------+-------------+------+---------------+------+---------+------+------+---------+----------+------------+-------------------------------------------------+

| id   | select_type | table       | type | possible_keys | key  | key_len | ref  | rows | r_rows  | filtered | r_filtered | Extra                                           |

+------+-------------+-------------+------+---------------+------+---------+------+------+---------+----------+------------+-------------------------------------------------+

|    1 | SIMPLE      | biblio      | ALL  | NULL          | NULL | NULL    | NULL | 436  | 436.00  |   100.00 |      75.00 | Using where; Using temporary; Using filesort    |

|    1 | SIMPLE      | auth_header | ALL  | NULL          | NULL | NULL    | NULL | 1560 | 1706.00 |   100.00 |       0.06 | Using where; Using join buffer (flat, BNL join) |

+------+-------------+-------------+------+---------------+------+---------+------+------+---------+----------+------------+-------------------------------------------------+

2 rows in set (1.145 sec)

 

 

 

David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

From: Koha-devel <koha-devel-bounces at lists.koha-community.org> On Behalf Of Doug Kingston
Sent: Saturday, 2 July 2022 5:39 AM
To: koha-devel <koha-devel at lists.koha-community.org>
Subject: [Koha-devel] How can I improve this report?

 

Report: Authors not in Authorities

 

SELECT DISTINCT(author) AS heading FROM biblio WHERE author NOT IN (SELECT ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code="a"]') AS heading FROM auth_header WHERE authtypecode='PERSO_NAME')ORDER BY heading

 

I am not a SQL expert, but I believe this is doing an unkeyed troll through every authority record for every biblio (O(Nbibs * Mauths), which is pretty bad).  I suspect we can do better by trolling Auths once to create a set and then compare Bib authors against it.  How can we do this?

 

-Doug-

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20220704/20381820/attachment.htm>


More information about the Koha-devel mailing list