Discussion:
[Exist-open] eXist app through Apache proxy goes compute-bound
Craig A. Berry
2017-06-08 02:38:00 UTC
Permalink
We have an eXist app (a now somewhat distant cousin of TEI Publisher) that has been running reasonably well for a couple of months on an AWS linux instance, but last week started hanging. Access from the outside world is via an Apache proxy.

One of the eXist processes, according to top, now goes into a state shortly after start-up in which it consumes 180%-200% of the cpu on this 2-cpu instance. Initially the application still works, albeit slowly, but within somewhere between a couple of minutes and a couple of hours it stops responding at all. If I start Monitoring and Profiling immediately upon start-up, it will run briefly before getting disconnected, and it shows that there are no running jobs, no running queries, no recent queries, no waiting threads, and no active threads.

The problem only happens when the Apache proxy is running. If I don't start Apache and only access the application on the 8443 port, everything seems fine. I changed the proxy timeout from 60 seconds to 20 minutes and it had no effect; the problem started in well under 20 minutes of start-up. Whether the problem has anything to do with Apache per se or rather with something arriving from the outside world via Apache I don't know. That said, the only thing I see in the Apache access log is some robots following our links, but it's only two or three requests per minute, so it doesn't seem like that would overwhelm anything.

We upgraded eXist from 3.1.0 to 3.2.0 and observed no differences.

Has anyone seen anything like this or have any suggestions on how to debug it?

________________________________________
Craig A. Berry

"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser
Joe Wicentowski
2017-06-08 11:10:07 UTC
Permalink
Hi Craig,

Are there any clues in exist.log? Do the requests in the Apache access log
line up 1:1 with the requests in Jetty's access logs
($EXIST_HOME/tools/jetty/logs)?

You might also try to grab JMX status snapshots when eXist is seizing up by
visiting the "/status" page directly and saving it. You will probably
needn't to append "?token=" followed by the token stored in
$EXIST_HOME/webapp/WEB-INF/data/jmxservlet.token (if memory serves). This
is what monex polls, but if you can post the one that you get just before
eXist becomes completely unresponsive this can contain clues about what's
going on.

Joe
Post by Craig A. Berry
We have an eXist app (a now somewhat distant cousin of TEI Publisher) that
has been running reasonably well for a couple of months on an AWS linux
instance, but last week started hanging. Access from the outside world is
via an Apache proxy.
One of the eXist processes, according to top, now goes into a state
shortly after start-up in which it consumes 180%-200% of the cpu on this
2-cpu instance. Initially the application still works, albeit slowly, but
within somewhere between a couple of minutes and a couple of hours it stops
responding at all. If I start Monitoring and Profiling immediately upon
start-up, it will run briefly before getting disconnected, and it shows
that there are no running jobs, no running queries, no recent queries, no
waiting threads, and no active threads.
The problem only happens when the Apache proxy is running. If I don't
start Apache and only access the application on the 8443 port, everything
seems fine. I changed the proxy timeout from 60 seconds to 20 minutes and
it had no effect; the problem started in well under 20 minutes of
start-up. Whether the problem has anything to do with Apache per se or
rather with something arriving from the outside world via Apache I don't
know. That said, the only thing I see in the Apache access log is some
robots following our links, but it's only two or three requests per minute,
so it doesn't seem like that would overwhelm anything.
We upgraded eXist from 3.1.0 to 3.2.0 and observed no differences.
Has anyone seen anything like this or have any suggestions on how to debug it?
________________________________________
Craig A. Berry
"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
https://lists.sourceforge.net/lists/listinfo/exist-open
--
Sent from my iPhone
Craig A. Berry
2017-06-08 23:20:21 UTC
Permalink
Thanks for the reply.
Post by Joe Wicentowski
Are there any clues in exist.log?
As far as I can tell, only victims, not perpetrators. So, for example, I've seen a broken pipe here and there, but it seemed to be from after things went crazy.
Post by Joe Wicentowski
Do the requests in the Apache access log line up 1:1 with the requests in Jetty's access logs ($EXIST_HOME/tools/jetty/logs)?
Yes.
Post by Joe Wicentowski
You might also try to grab JMX status snapshots when eXist is seizing up by visiting the "/status" page directly and saving it. You will probably needn't to append "?token=" followed by the token stored in $EXIST_HOME/webapp/WEB-INF/data/jmxservlet.token (if memory serves). This is what monex polls, but if you can post the one that you get just before eXist becomes completely unresponsive this can contain clues about what's going on.
That's a good tip. It took me a couple minutes to find jmxservlet.token because we have our data in an alternate location, but following the trail from the configuration got me there.

This particular bug decided to go underground as soon as I announced its presence in public. We started things up so we could follow the recommended debugging suggestions, but everything has been working fine for some hours now. For the first time in over a week. Quite a puzzle, but I now have some things to try if it shows up again.
Post by Joe Wicentowski
Joe
We have an eXist app (a now somewhat distant cousin of TEI Publisher) that has been running reasonably well for a couple of months on an AWS linux instance, but last week started hanging. Access from the outside world is via an Apache proxy.
One of the eXist processes, according to top, now goes into a state shortly after start-up in which it consumes 180%-200% of the cpu on this 2-cpu instance. Initially the application still works, albeit slowly, but within somewhere between a couple of minutes and a couple of hours it stops responding at all. If I start Monitoring and Profiling immediately upon start-up, it will run briefly before getting disconnected, and it shows that there are no running jobs, no running queries, no recent queries, no waiting threads, and no active threads.
The problem only happens when the Apache proxy is running. If I don't start Apache and only access the application on the 8443 port, everything seems fine. I changed the proxy timeout from 60 seconds to 20 minutes and it had no effect; the problem started in well under 20 minutes of start-up. Whether the problem has anything to do with Apache per se or rather with something arriving from the outside world via Apache I don't know. That said, the only thing I see in the Apache access log is some robots following our links, but it's only two or three requests per minute, so it doesn't seem like that would overwhelm anything.
We upgraded eXist from 3.1.0 to 3.2.0 and observed no differences.
Has anyone seen anything like this or have any suggestions on how to debug it?
________________________________________
Craig A. Berry
"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
https://lists.sourceforge.net/lists/listinfo/exist-open
--
Sent from my iPhone
________________________________________
Craig A. Berry

"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser
Adam Retter
2017-06-08 11:21:52 UTC
Permalink
Hi Craig,

Two things come to mind:

1) Some sort of runaway process in eXist. You can use the 'jstack' tool
which shops with the JDK to get a point-in-time trace of exactly what eXist
is doing. You might want to take a few of these to compare it's slow and
locked up states.

2) Apache overwhelming eXist with network requests, either for genuine user
reasons or perhaps due to a bad config causing some sort of feedback loop.
You can use tools like Wiredshark or tcpdump to capture the network traffic
between Apache and eXist to help understand the interactions between
startup and when it all seems to stop responding.
Post by Craig A. Berry
We have an eXist app (a now somewhat distant cousin of TEI Publisher) that
has been running reasonably well for a couple of months on an AWS linux
instance, but last week started hanging. Access from the outside world is
via an Apache proxy.
One of the eXist processes, according to top, now goes into a state
shortly after start-up in which it consumes 180%-200% of the cpu on this
2-cpu instance. Initially the application still works, albeit slowly, but
within somewhere between a couple of minutes and a couple of hours it stops
responding at all. If I start Monitoring and Profiling immediately upon
start-up, it will run briefly before getting disconnected, and it shows
that there are no running jobs, no running queries, no recent queries, no
waiting threads, and no active threads.
The problem only happens when the Apache proxy is running. If I don't
start Apache and only access the application on the 8443 port, everything
seems fine. I changed the proxy timeout from 60 seconds to 20 minutes and
it had no effect; the problem started in well under 20 minutes of
start-up. Whether the problem has anything to do with Apache per se or
rather with something arriving from the outside world via Apache I don't
know. That said, the only thing I see in the Apache access log is some
robots following our links, but it's only two or three requests per minute,
so it doesn't seem like that would overwhelm anything.
We upgraded eXist from 3.1.0 to 3.2.0 and observed no differences.
Has anyone seen anything like this or have any suggestions on how to debug it?
________________________________________
Craig A. Berry
"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
https://lists.sourceforge.net/lists/listinfo/exist-open
Craig A. Berry
2017-06-08 23:51:51 UTC
Permalink
Post by Joe Wicentowski
Hi Craig,
1) Some sort of runaway process in eXist. You can use the 'jstack' tool which shops with the JDK to get a point-in-time trace of exactly what eXist is doing. You might want to take a few of these to compare it's slow and locked up states.
Thanks. I'm not much of a Java person and had been thinking there must be some way to identify what's stuck in a loop the way you would do with dtrace or a debugger or some profiling tool with other languages. Now I know.
Post by Joe Wicentowski
2) Apache overwhelming eXist with network requests, either for genuine user reasons or perhaps due to a bad config causing some sort of feedback loop. You can use tools like Wiredshark or tcpdump to capture the network traffic between Apache and eXist to help understand the interactions between startup and when it all seems to stop responding.
We installed wireshark and tcpdump, but as I mentioned in my reply to Joe, the problem went away as soon as we started things up again to observe it in action. I've used wireshark or tcpdump once or twice in the past, and the problem is you tend to need a lot of knowledge about networking primitives in order to know what they are telling you. But still good to have available, and I will keep them in mind if this crops up again.
Post by Joe Wicentowski
We have an eXist app (a now somewhat distant cousin of TEI Publisher) that has been running reasonably well for a couple of months on an AWS linux instance, but last week started hanging. Access from the outside world is via an Apache proxy.
One of the eXist processes, according to top, now goes into a state shortly after start-up in which it consumes 180%-200% of the cpu on this 2-cpu instance. Initially the application still works, albeit slowly, but within somewhere between a couple of minutes and a couple of hours it stops responding at all. If I start Monitoring and Profiling immediately upon start-up, it will run briefly before getting disconnected, and it shows that there are no running jobs, no running queries, no recent queries, no waiting threads, and no active threads.
The problem only happens when the Apache proxy is running. If I don't start Apache and only access the application on the 8443 port, everything seems fine. I changed the proxy timeout from 60 seconds to 20 minutes and it had no effect; the problem started in well under 20 minutes of start-up. Whether the problem has anything to do with Apache per se or rather with something arriving from the outside world via Apache I don't know. That said, the only thing I see in the Apache access log is some robots following our links, but it's only two or three requests per minute, so it doesn't seem like that would overwhelm anything.
We upgraded eXist from 3.1.0 to 3.2.0 and observed no differences.
Has anyone seen anything like this or have any suggestions on how to debug it?
________________________________________
Craig A. Berry
"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Exist-open mailing list
https://lists.sourceforge.net/lists/listinfo/exist-open
________________________________________
Craig A. Berry

"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser

Continue reading on narkive:
Search results for '[Exist-open] eXist app through Apache proxy goes compute-bound' (Questions and Answers)
5
replies
can i get question answer of asp.net ?
started 2006-10-11 00:02:47 UTC
software
Loading...