[Koha-devel] How can I improve this report?
dcook at prosentient.com.au
dcook at prosentient.com.au
Mon Jul 4 02:07:27 CEST 2022
Hey Doug,
In terms of analyzing queries, I find MySQL/MariaDB to be far more opaque than PostgreSQL, but in theory the “EXPLAIN” and “ANALYZE” keywords can help you out with this one[1].
Overall, I’d say it depends on what you’re trying to do and your data set. I have databases with hundreds of thousands of authority records, and for those I’d probably write a custom script to query Zebra since it’s already indexed the data.
--
[1]
Using koha-testing-docker, here is the query you provided:
MariaDB [koha_kohadev]> ANALYZE SELECT DISTINCT(author) AS heading FROM biblio WHERE author NOT IN (SELECT ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code="a"]') AS heading FROM auth_header WHERE authtypecode='PERSO_NAME') O
RDER BY heading;
+------+--------------------+-------------+------+---------------+------+---------+------+------+--------+----------+------------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+--------------------+-------------+------+---------------+------+---------+------+------+--------+----------+------------+----------------------------------------------+
| 1 | PRIMARY | biblio | ALL | NULL | NULL | NULL | NULL | 436 | 436.00 | 100.00 | 16.28 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | auth_header | ALL | NULL | NULL | NULL | NULL | 1560 | 653.58 | 100.00 | 0.13 | Using where |
+------+--------------------+-------------+------+---------------+------+---------+------+------+--------+----------+------------+----------------------------------------------+
2 rows in set (1.380 sec)
Here is another one I cooked up which apparently is marginally faster:
MariaDB [koha_kohadev]> ANALYZE SELECT DISTINCT(biblio.author) AS heading, derived.ah FROM biblio LEFT JOIN (SELECT ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code="a"]') as ah FROM auth_header WHERE authtypecode='PERSO_NAM
E') AS derived ON biblio.author = derived.ah WHERE derived.ah is null and biblio.author is not null ORDER BY biblio.author;
+------+-------------+-------------+------+---------------+------+---------+------+------+---------+----------+------------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+-------------+------+---------------+------+---------+------+------+---------+----------+------------+-------------------------------------------------+
| 1 | SIMPLE | biblio | ALL | NULL | NULL | NULL | NULL | 436 | 436.00 | 100.00 | 75.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | auth_header | ALL | NULL | NULL | NULL | NULL | 1560 | 1706.00 | 100.00 | 0.06 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------------+------+---------------+------+---------+------+------+---------+----------+------------+-------------------------------------------------+
2 rows in set (1.145 sec)
David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia
Office: 02 9212 0899
Online: 02 8005 0595
From: Koha-devel <koha-devel-bounces at lists.koha-community.org> On Behalf Of Doug Kingston
Sent: Saturday, 2 July 2022 5:39 AM
To: koha-devel <koha-devel at lists.koha-community.org>
Subject: [Koha-devel] How can I improve this report?
Report: Authors not in Authorities
SELECT DISTINCT(author) AS heading FROM biblio WHERE author NOT IN (SELECT ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code="a"]') AS heading FROM auth_header WHERE authtypecode='PERSO_NAME')ORDER BY heading
I am not a SQL expert, but I believe this is doing an unkeyed troll through every authority record for every biblio (O(Nbibs * Mauths), which is pretty bad). I suspect we can do better by trolling Auths once to create a set and then compare Bib authors against it. How can we do this?
-Doug-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20220704/20381820/attachment.htm>
More information about the Koha-devel
mailing list