[Koha-devel] Optimizing Starman startup
Ere Maijala
ere.maijala at helsinki.fi
Tue Apr 27 07:27:33 CEST 2021
Thanks for digging up all the information. I think adding a caching
validator would provide a nice improvement, so I'm in favor of it, and
if I were to decide, I'd also include it in Koha.
Still, getting it upstream would be great in the long run, so I'd
support that as well.
--Ere
dcook at prosentient.com.au kirjoitti 27.4.2021 klo 5.34:
> After more experimenting, I have concluded that
> JSON::Validator->validate() is the culprit in terms of CPU time and
> memory usage.
>
> Fortunately, I’ve determined that Mojo::JSON and Digest::MD5 can be used
> together to create consistent reproducible checksums, which could be
> used for caching validated schemas.
>
> Of course, a solution would involve changes to JSON::Validator (and
> possibly Mojolicious::Plugin::OpenAPI depending on the chosen solution),
> and then we’d have to wait for the new and improved version to come
> downstream, so we wouldn’t see the benefit of this for years.
>
> That said… we could always roll our own JSON::Validator. And if we don’t
> want to do it as a community, I could always just do it myself.
>
> In terms of testing… with 18 CPUs I can restart 60 instances (120
> processes) and get through the app setup in about 60 seconds with
> significant server load, when using validation. Without validation, I
> can do it in 20 seconds without significant server load (beyond a few
> short-lived CPU spikes).
>
> I’m thinking about writing a patch and sending a pull request for
> JSON::Validator, but also really thinking about implementing it locally
> too at least in the meantime.
>
> I haven’t heard from the author of JSON::Validator for a little while
> now, but I hope I do hear back from him. I think it would be a great
> addition to the library.
>
> David Cook
>
> Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
> *From:*Koha-devel <koha-devel-bounces at lists.koha-community.org> *On
> Behalf Of *dcook at prosentient.com.au
> *Sent:* Monday, 26 April 2021 7:01 PM
> *To:* 'Renvoize, Martin' <martin.renvoize at ptfs-europe.com>
> *Cc:* 'Koha Devel' <koha-devel at lists.koha-community.org>
> *Subject:* Re: [Koha-devel] Optimizing Starman startup
>
> After some more experimenting, it’s clear that the problem isn’t
> JSON::Validator::OpenAPI::Mojolicious or
> Koha::REST::Plugin::PluginRoutes. If you exclude
> Mojolicious::Plugin::OpenAPI, the startup is very fast. It’s 30 seconds
> start to finish to restart 60 instances and each instance restarts very
> quickly.
>
> When using Mojolicious::Plugin::OpenAPI, it takes about 3 minutes and
> there’s a fair bit of downtime during that time.
>
> When I do a strace, I’m noticing that a process can spend 30 seconds
> just allocating memory for Mojolicious::Plugin::OpenAPI, but it only
> happens once you hit a certain volume of processes. If you’re just
> starting up 1 or 2, then it’s only a couple seconds. But if you have say
> 60-120 processes, it can take up to 30 seconds for
> Mojolicious::Plugin::OpenAPI to do its work. I’m putting 10 CPUs to this
> work, but clearly that’s not enough. I imagine there may be other
> bottlenecks accessing the memory as well.
>
> Has anyone profiled Mojolicious before? I’m guessing maybe Martin?
>
> I suspect that this is just a problem that I’m going to have to live
> with but maybe it is a case where I can find a way to optimize
> Mojolicious::Plugin::OpenAPI.
>
> David Cook
>
> Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
> *From:*Koha-devel <koha-devel-bounces at lists.koha-community.org
> <mailto:koha-devel-bounces at lists.koha-community.org>> *On Behalf Of
> *dcook at prosentient.com.au <mailto:dcook at prosentient.com.au>
> *Sent:* Monday, 26 April 2021 5:12 PM
> *To:* 'Renvoize, Martin' <martin.renvoize at ptfs-europe.com
> <mailto:martin.renvoize at ptfs-europe.com>>
> *Cc:* 'Koha Devel' <koha-devel at lists.koha-community.org
> <mailto:koha-devel at lists.koha-community.org>>
> *Subject:* Re: [Koha-devel] Optimizing Starman startup
>
> So I just tried the following…
>
> --
>
> root at kohadevbox:koha(master)$ npm install -g swagger-cli
>
> /usr/bin/swagger-cli -> /usr/lib/node_modules/swagger-cli/swagger-cli.js
>
> npm WARN @apidevtools/swagger-parser at 10.0.2 requires a peer of
> openapi-types@>=7 but none is installed. You must install peer
> dependencies yourself.
>
> + swagger-cli at 4.0.4 <mailto:swagger-cli at 4.0.4>
>
> added 46 packages from 27 contributors in 8.203s
>
> --
>
> root at kohadevbox:koha(master)$ time swagger-cli bundle
> api/v1/swagger/swagger.json --outfile api/v1/swagger/openapi.json --type
> json
>
> Created api/v1/swagger/openapi.json from api/v1/swagger/swagger.json
>
> real 0m0.296s
>
> user 0m0.346s
>
> sys 0m0.032s
>
> openapi.json is 10891 lines long but it actually contains 741 $ref lines
> like "$ref": "#/definitions/error" and "$ref":
> "#/definitions/patron_extended_attribute".
>
> --
>
> Now to do some benchmarking… I ran the following code:
>
> #!/usr/bin/perl
>
> use JSON::Validator::OpenAPI::Mojolicious;
>
> my $validator = JSON::Validator::OpenAPI::Mojolicious->new;
>
> my $spec = $validator->bundle({
>
> replace => 1,
>
> schema => "api/v1/swagger/swagger.json",
>
> });
>
> The first time I ran it… it took 1.343 seconds. The second time and
> subsequent times it took .354 seconds. (That’s using Ubuntu 20.04 and
> JSON Validator 3.14.) That suggests caching although I’m not sure where.
> I don’t see anything obvious in /usr/share/perl5/JSON/Validator/cache.
>
> Trying with openapi.json yields .280 seconds instead of .354 seconds.
> It’s faster, but not significantly.
>
> So that suggests that the problem is actually with
> Koha::REST::Plugin::PluginRoutes or Mojolicious::Plugin::OpenAPI more
> specifically…
>
> David Cook
>
> Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
> *From:*dcook at prosentient.com.au <mailto:dcook at prosentient.com.au>
> <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au>>
> *Sent:* Monday, 26 April 2021 11:24 AM
> *To:* 'Renvoize, Martin' <martin.renvoize at ptfs-europe.com
> <mailto:martin.renvoize at ptfs-europe.com>>
> *Cc:* 'Koha Devel' <koha-devel at lists.koha-community.org
> <mailto:koha-devel at lists.koha-community.org>>
> *Subject:* RE: [Koha-devel] Optimizing Starman startup
>
> I think that I accidentally offended him, as he hasn’t responded to me
> since his initial response.
>
> I do wonder if reducing the number of references would help, although I
> wonder how easy that would be to do in practice. It looks like we have
> about 5767 lines of JSON all up as is… so it would probably get even
> bigger if we dereferenced them.
>
> Oh… here’s a thought. Why don’t we compile it? According to
> https://davidgarcia.dev/posts/how-to-split-open-api-spec-into-multiple-files/
> <https://davidgarcia.dev/posts/how-to-split-open-api-spec-into-multiple-files/>,
> you can maintain many different files, and then use something like
> swagger-cli to create a single built/compiled OpenAPI file.
>
> That way JSON::Validator wouldn’t need to resolve any references for the
> core API. I don’t know if the plugins have any $ref in them but I’m
> guessing not (just based on Coverflow). So that could be a big win.
>
> I’m working on other things at the moment, but I’m going to put that on
> my eternal list.
>
> David Cook
>
> Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
> *From:*Renvoize, Martin <martin.renvoize at ptfs-europe.com
> <mailto:martin.renvoize at ptfs-europe.com>>
> *Sent:* Friday, 23 April 2021 5:24 PM
> *To:* David Cook <dcook at prosentient.com.au
> <mailto:dcook at prosentient.com.au>>
> *Cc:* Koha Devel <koha-devel at lists.koha-community.org
> <mailto:koha-devel at lists.koha-community.org>>
> *Subject:* Re: [Koha-devel] Optimizing Starman startup
>
> Jan's code is certainly challenging to read and understand at times I
> agree.. I used to contribute to the plugin a number of years ago now..
> but the project that gave me time to play with that has since been sold
> on so I'm not involved at the level I used to be.. he uses lots of Perl
> foo which often takes me a long time to wrap my head around.
>
> As for the refs, I think we split our spec up too much in all honesty..
> even the swagger spec suggest we went too far.. I think I might have
> been unclear when I first pushed for a split from one massive file. We
> could/should certainly reduce that somewhat.. it'll be interesting to
> see if it makes much difference.. that could be a fairly quick win.
>
> On Thu, 22 Apr 2021, 12:32 am , <dcook at prosentient.com.au
> <mailto:dcook at prosentient.com.au>> wrote:
>
> Hi Ere,
>
> I think you're right about the refs. While they get resolved by the
> OpenAPI plugin, you probably have to resolve them before trying to
> dynamically inject the routes from plugins.
>
> Jan Thorsen (the author of Mojolicious::Plugin::OpenAPI and
> JSON::Validator) thinks that the ref resolution is actually what's
> taking so long. I looked it up and I think we have over 400
> different references in the main OpenAPI spec alone. I haven't
> profiled it but something to think about.
>
> At some point, I'm going to have a play with newer versions of the
> modules. I'm gong to look at Ubuntu 20.04 and newer Debian versions
> to see what I can get away with in terms of newness. Needs more
> investigation, but I am really hoping that this is an issue that can
> be solved by just upgrading the OS.
>
> I find Jan's code to be unnecessarily opaque (could use more
> descriptive comments and function naming) but... I'll investigate.
> Probably not right away as I have a bunch of other priorities that I
> have to address but... this is on my mind.
>
> Starman startup time is probably the thing about Koha annoying me
> the most right now and probably the most practical thing I can
> improve at the moment...
>
> David Cook
> Software Engineer
> Prosentient Systems
> Suite 7.03
> 6a Glen St
> Milsons Point NSW 2061
> Australia
>
> Office: 02 9212 0899
> Online: 02 8005 0595
>
> -----Original Message-----
> From: Ere Maijala <ere.maijala at helsinki.fi
> <mailto:ere.maijala at helsinki.fi>>
> Sent: Wednesday, 21 April 2021 6:31 PM
> To: dcook at prosentient.com.au <mailto:dcook at prosentient.com.au>;
> koha-devel at lists.koha-community.org
> <mailto:koha-devel at lists.koha-community.org>
> Subject: Re: [Koha-devel] Optimizing Starman startup
>
> Hi David,
>
> I wish I'd remember all the details, but my memory fails me. I think
> not using JSON had something to do with how the refs are resolved.
> That may or may not have been the reason, but if everything works
> with JSON module, I can't think of a reason not to use it.
>
> Thanks for taking a look!
>
> --Ere
>
> dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> kirjoitti
> 21.4.2021 klo 3.28:
> > Hi Ere,
> >
> > Thanks for your reply. 24700 looks much better. I'll look at
> backporting it locally.
> >
> > Although I'm looking at JSON::Validator::OpenAPI::Mojolicious at
> https://metacpan.org/pod/release/JHTHORSEN/Mojolicious-Plugin-OpenAPI-2.19/lib/JSON/Validator/OpenAPI/Mojolicious.pm
> <https://metacpan.org/pod/release/JHTHORSEN/Mojolicious-Plugin-OpenAPI-2.19/lib/JSON/Validator/OpenAPI/Mojolicious.pm>
> and it says "Do not use this module directly. Use
> Mojolicious::Plugin::OpenAPI instead." I notice that you're using
> the "bundle" method. Do we really need that there? Why don't we just
> load the JSON using the JSON module, merge with the plugin spec
> files, and then pass it to the OpenAPI plugin? Shouldn't the plugin
> take care of the $ref replacement?
> >
> > Hmm... I didn't realize until now that the OpenAPI plugin was
> doing a validate behind the scenes. That's tricky.
> >
> > At a glance, we might be able to pre-load the app into the Starman
> > master process pre-fork. There are warnings about doing that with
> open
> > database connections, so we'd need to review plack.psgi, but a quick
> > glance suggests it might be OK. (Alternatively, I have wondered
> about
> > running the REST API as a separate process apart from Starman using
> > hypnotoad. According to
> > https://docs.mojolicious.org/Mojolicious/Guides/Cookbook
> <https://docs.mojolicious.org/Mojolicious/Guides/Cookbook>,
> > Mojo::Server::Prefork preloads the application in the manager/master
> > process, and Hypnotoad is based off that, so that would help.)
> >
> > It does seem like changes to the OpenAPI plugin would be needed
> for caching.
> >
> > I'm going to try backporting your change and try pre-loading and
> see how far that gets me.
> >
> > David Cook
> > Software Engineer
> > Prosentient Systems
> > Suite 7.03
> > 6a Glen St
> > Milsons Point NSW 2061
> > Australia
> >
> > Office: 02 9212 0899
> > Online: 02 8005 0595
> >
> > -----Original Message-----
> > From: Koha-devel <koha-devel-bounces at lists.koha-community.org
> <mailto:koha-devel-bounces at lists.koha-community.org>> On
> > Behalf Of Ere Maijala
> > Sent: Tuesday, 20 April 2021 4:48 PM
> > To: koha-devel at lists.koha-community.org
> <mailto:koha-devel at lists.koha-community.org>
> > Subject: Re: [Koha-devel] Optimizing Starman startup
> >
> > Hi,
> >
> > I did some work on improving it here:
> >
> > https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=24700
> <https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=24700>
> >
> > That shaved a good bit of time from it, but it's still a heavy
> > operation, and it would make sense to
> >
> > 1.) avoid doing it too often
> >
> > 2.) cache the results and avoid doing it if results are cached
> >
> > If you could address the first one, that'd go a long way. I'm
> afraid the second one would require changes to the OpenAPI plugin to
> support caching.
> >
> > --Ere
> >
> > dcook at prosentient.com.au <mailto:dcook at prosentient.com.au>
> kirjoitti 20.4.2021 klo 6.15:
> >> Hi all,
> >>
> >> Do you despair when you see the following periodically in “top”
> when
> >> a starman worker is recreated ?
> >>
> >> PID USER PR NI VIRT RES SHR S %CPU %MEM
> TIME+
> >> COMMAND
> >>
> >> 9529 my-koha 20 0 460108 197212 17172 R 100.0 0.4 0:03.41
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> Or the following in top when you install koha-common package or
> >> restart the koha-common service?
> >>
> >> 11101 1-koha 20 0 447232 193320 16076 R 10.6 0.4 0:09.09
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> 11168 1-koha 20 0 447240 193264 16056 R 10.6 0.4 0:08.72
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> 11306 2-koha 20 0 447220 193148 16000 R 10.6 0.4 0:08.07
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> 11543 2-koha 20 0 447232 193036 15828 R 10.6 0.4 0:07.07
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> 11784 3-koha 20 0 441536 189664 16172 R 10.6 0.4 0:06.04
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> 11830 3-koha 20 0 439548 187212 15748 R 10.6 0.4 0:05.82
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> 11831 4-koha 20 0 438620 186344 15748 R 10.6 0.4 0:05.81
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> 11853 4-koha 20 0 437680 185672 16000 R 10.6 0.4 0:05.79
> >> /usr/share/koha/api/v1/app.pl <http://app.pl>
> >>
> >> Well, I still have a lot of investigation left to do, but I
> notice 1
> >> place that a lot of time taken is here (per worker):
> >>
> >> my $validator = JSON::Validator::OpenAPI::Mojolicious->new;
> >>
> >> $validator->load_and_validate_schema(
> >>
> >> $self->home->rel_file("api/v1/swagger/swagger.json"),
> >>
> >> {
> >>
> >> allow_invalid_ref => 1,
> >>
> >> }
> >>
> >> );
> >>
> >> push @{$self->routes->namespaces}, 'Koha::Plugin';
> >>
> >> my $spec = $validator->schema->data;
> >>
> >> $self->plugin(
> >>
> >> 'Koha::REST::Plugin::PluginRoutes' => {
> >>
> >> spec => $spec,
> >>
> >> validator => $validator
> >>
> >> }
> >>
> >> );
> >>
> >> $self->plugin(
> >>
> >> OpenAPI => {
> >>
> >> spec => $spec,
> >>
> >> route =>
> >> $self->routes->under('/api/v1')->to('Auth#under'),
> >>
> >> allow_invalid_ref =>
> >>
> >> 1, # required by our spec because $ref directly
> >> under
> >>
> >> # Paths-, Parameters-, Definitions- &
> >> Info-object
> >>
> >> # is not allowed by the OpenAPI specification.
> >>
> >> }
> >>
> >> );
> >>
> >> Anyone have ideas for improving this? Do we have to validate the
> >> schema every time? Can we move the schema validation into a
> different
> >> module and preload it into Starman using the -M flag so that it’s
> >> done
> >> 1 time per Starman master instance rather than 1 time per
> Starman worker instance?
> >>
> >> I find “/usr/share/koha/api/v1/app.pl <http://app.pl>” to be the
> bane of deployments,
> >> as it puts a massive load on a server, when you have multiple Koha
> >> instances on the server.
> >>
> >> David Cook
> >>
> >> Software Engineer
> >>
> >> Prosentient Systems
> >>
> >> Suite 7.03
> >>
> >> 6a Glen St
> >>
> >> Milsons Point NSW 2061
> >>
> >> Australia
> >>
> >> Office: 02 9212 0899
> >>
> >> Online: 02 8005 0595
> >>
> >>
> >> _______________________________________________
> >> Koha-devel mailing list
> >> Koha-devel at lists.koha-community.org
> <mailto:Koha-devel at lists.koha-community.org>
> >>
> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> <https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel>
> >> website : https://www.koha-community.org/
> <https://www.koha-community.org/> git :
> >> https://git.koha-community.org/
> <https://git.koha-community.org/> bugs :
> >> https://bugs.koha-community.org/ <https://bugs.koha-community.org/>
> >>
> >
> > --
> > Ere Maijala
> > Kansalliskirjasto / The National Library of Finland
> > _______________________________________________
> > Koha-devel mailing list
> > Koha-devel at lists.koha-community.org
> <mailto:Koha-devel at lists.koha-community.org>
> >
> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> <https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel>
> > website : https://www.koha-community.org/
> <https://www.koha-community.org/> git :
> > https://git.koha-community.org/ <https://git.koha-community.org/>
> bugs :
> > https://bugs.koha-community.org/ <https://bugs.koha-community.org/>
> >
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> <mailto:Koha-devel at lists.koha-community.org>
> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> <https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel>
> website : https://www.koha-community.org/
> <https://www.koha-community.org/>
> git : https://git.koha-community.org/ <https://git.koha-community.org/>
> bugs : https://bugs.koha-community.org/
> <https://bugs.koha-community.org/>
>
--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
More information about the Koha-devel
mailing list