[Koha-bugs] [Bug 27078] Starman hanging in 3-node Koha cluster when 1 node goes offline.

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Nov 24 15:35:03 CET 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27078

--- Comment #9 from Christian McDonald <rcmcdonald91 at gmail.com> ---
Thanks for the replies... I'm still mulling over it.

Latest Findings:

** htop is uninformative in this case. I don't see any memory or CPU spikes.

** ElasticSearch is healthy even with one-node offline (3 node cluster):

$ curl localhost:9200/_cluster/health
{"cluster_name":"koha_es","status":"green","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":10,"active_shards":20,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

* MariaDB is healthy too with one-node offline (3 node cluster):

$ MariaDB [(none)]> show status like '%wsrep%';
I'm not going to paste the output here, but this command is clearly indicating
that the failed node is NOT actively participating in the cluster and latency
between nodes is basically <1ms. i.e. 0/0/0/0/0

When running tcpdump filtered to look for traffic with the offline node, I only
see ARP requests. I'm running 'tcpdump host koha01.lab.mydomain.com'

Still investigating.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.


More information about the Koha-bugs mailing list