Discussion:
[Exist-open] Lucene.lock after database crash - causes all indexes to fail
Legend of the Dragoon
2011-06-09 03:34:18 UTC
Permalink
Hi!

Thanks to the eXist team for helping through the xmldb:move corruption.  I haven't had any troubles since that day, but now I do again.  My database crashed. 

After the crash, a lucene.lock file was left in the data/lucene folder.  The database did not want to start until it was removed.  After the lock file was removed, the database started again but all of the XQueries that rely on configured indexes in collection.xconf no longer return anything.

This seems to include both lucene indexes and normal <qname> indexes.  All these queries run (so my app "works") but they do not find any results. 

This is using eXist 1.4.x. I will gather together some logs and provide more data, but wanted to know if anyone had an immediate idea on the issue.

Thanks,
Lothy
Adam Retter
2011-06-09 09:50:58 UTC
Permalink
Post by Legend of the Dragoon
After the crash, a lucene.lock file was left in the data/lucene folder.  The database did not want to start until it was removed.  After the lock file was removed, the database started again but all of the XQueries that rely on configured indexes in collection.xconf no longer return anything.
I am not sure if it is safe to just remove that file or not?
Post by Legend of the Dragoon
This seems to include both lucene indexes and normal <qname> indexes.  All these queries run (so my app "works") but they do not find any results.
This is using eXist 1.4.x.  I will gather together some logs and provide more data, but wanted to know if anyone had an immediate idea on the issue.
Does a reindex of the database solve the issue?
Post by Legend of the Dragoon
Thanks,
Lothy
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Exist-open mailing list
https://lists.sourceforge.net/lists/listinfo/exist-open
--
Adam Retter

eXist Developer
{ United Kingdom }
***@exist-db.org
irc://irc.freenode.net/existdb
Legend of the Dragoon
2011-06-11 16:06:13 UTC
Permalink
Adam,

The problem has occurred again, so this time I took notes on the way through. In the past, I have tried re-indexing using the admin client and was not successful. However, this time, after a re-index, things seem to be restored properly. That is hopeful. Either way, I am providing the additional information here in case any other good plans (re-index takes a long time) can come to light, or there is a code fix that could be done.

I believe the cause of the problem is shutting down the server whenever Lucene is indexing something, which is alot.

Version is eXist 1.4.x, running in Tomcat 7.0. I have seen this in Jetty too, but the logs below come from Tomcat.

I am not sure about removing the file either--but the database will not come back up until all the lock files are removed (data/*.lock) and (data/lucene/write.lock)

Below, I have included a bunch of logging information that might help!

===============================
1. If the lock files have not been removed, the database instance will not become available.

File lock last access timestamp: Jun 11, 2011 C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\eXist\WEB-INF\data\journal.lck
Found a stale lockfile. Trying to remove it: C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\eXist\WEB-INF\data\journal.lck

AND....

SEVERE: Servlet /eXist threw load() exception
javax.servlet.ServletException: No database instance available
at org.exist.http.servlets.EXistServlet.init(EXistServlet.java:140)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1189)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1103)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1010)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4935)
at org.apache.catalina.core.StandardContext$3.call(StandardContext.java:5262)
at org.apache.catalina.core.StandardContext$3.call(StandardContext.java:5257)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

=====================================
2. the lucene/write.lock seems to be cause the most trouble:


2011-06-11 11:36:39,890 [Thread-2] ERROR (BrokerPool.java [configure]:238) - Unable to initialialize database instance 'exist': Exception while reading lucene index directory: Lock obtain timed out: ***@C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\eXist\WEB-INF\data\lucene\write.lock
org.exist.util.DatabaseConfigurationException: Exception while reading lucene index directory: Lock obtain timed out: ***@C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\eXist\WEB-INF\data\lucene\write.lock
at org.exist.indexing.lucene.LuceneIndex.open(LuceneIndex.java:89)
at org.exist.indexing.IndexManager.<init>(IndexManager.java:86)
at org.exist.storage.BrokerPool.initialize(BrokerPool.java:771)
at org.exist.storage.BrokerPool.<init>(BrokerPool.java:654)
at org.exist.storage.BrokerPool.configure(BrokerPool.java:216)
at org.exist.storage.BrokerPool.configure(BrokerPool.java:188)
at org.exist.http.servlets.EXistServlet.startup(EXistServlet.java:672)
at org.exist.http.servlets.EXistServlet.init(EXistServlet.java:113)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1189)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1103)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1010)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4935)
at org.apache.catalina.core.StandardContext$3.call(StandardContext.java:5262)
at org.apache.catalina.core.StandardContext$3.call(StandardContext.java:5257)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: ***@C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\eXist\WEB-INF\data\lucene\write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1550)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1525)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1290)
at org.exist.indexing.lucene.LuceneIndex.getWriter(LuceneIndex.java:144)
at org.exist.indexing.lucene.LuceneIndex.open(LuceneIndex.java:87)
... 18 more


===================================
3. Removing just the data/*.lock files still leaves the database down, on the lucene/write.lock exception (see #2).


===================================
4. Removing the data/lucene/write.lock lets the database come back up, but it comes up in a very unhappy place.

The application works, queries work, etc... however, ANY query that relies on an index (lucene or normal <qname>) fail to execute! In fact, from the recent testing, even simple queries using collection()/xpath fail too.

These queries just return nothing, but also do not throw any errors to the logs.

===================================
5. In this case, a re-index has seemed to solve the issue. The database is back up and working again. I have definitely tried this previously without positive results, and have been forced to revert to a restore point.

Thanks for any help,
Lothy
Roy Walter
2011-06-11 17:02:39 UTC
Permalink
Post by Legend of the Dragoon
I am not sure about removing the file either--but the database will not come back up until all the lock files are removed (data/*.lock) and (data/lucene/write.lock)
I had a similar problem. My error was, "database instance 'exist' is not
available". The following checklist provided by Joe W has saved my bacon
on several occasions:

1. Stop the running eXist database instance.
2. Change into directory EXIST_HOME/webapp/WEB-INF/data or another
directory you specified as data directory in the configuration
(conf.xml).
3. Delete all .log, .lck, .lock, and .dbx files *except*
collections.dbx,
dom.dbx, and symbols.dbx. Be sure to look in the lucene directory for
any .lock files, and delete them too.
4. Start eXist again. Assuming that cleared the block, you're all set
- just trigger a reindex to rebuild your indexes.

-- Roy
Roy Walter
2011-06-11 17:01:11 UTC
Permalink
Post by Legend of the Dragoon
I am not sure about removing the file either--but the database will not come back up until all the lock files are removed (data/*.lock) and (data/lucene/write.lock)
I had a similar problem. My error was, "database instance 'exist' is not
available". The following checklist provided by Joe W has saved my bacon
on several occasions:

1. Stop the running eXist database instance.
2. Change into directory EXIST_HOME/webapp/WEB-INF/data or another
directory you specified as data directory in the configuration
(conf.xml).
3. Delete all .log, .lck, .lock, and .dbx files *except*
collections.dbx,
dom.dbx, and symbols.dbx. Be sure to look in the lucene directory for
any .lock files, and delete them too.
4. Start eXist again. Assuming that cleared the block, you're all set
- just trigger a reindex to rebuild your indexes.

-- Roy
Dave Finton
2011-06-14 16:14:59 UTC
Permalink
One thing that I learned the hard way is that if I were to do a massive
update on an indexed data set, it's a good idea to turn off lucene indexing
temporarily (I accomplished this by removing the collection.xconf file from
/db/system/config). I inadvertently forgot to do this once and it cause the
disk device driver to report some rather interesting error messages
(indicating there was way too much disk activity going on). In our case we
had to do a restore from one of our backups, but we didn't try this
solution. I'm going to keep this tucked away if anything similar happens.
Post by Legend of the Dragoon
I am not sure about removing the file either--but the database will not come back up until all the lock files are removed (data/*.lock) and (data/lucene/write.lock)
I had a similar problem. My error was, "database instance 'exist' is not
available". The following checklist provided by Joe W has saved my bacon on
1. Stop the running eXist database instance.
2. Change into directory EXIST_HOME/webapp/WEB-INF/data or another
directory you specified as data directory in the configuration
(conf.xml).
3. Delete all .log, .lck, .lock, and .dbx files *except* collections.dbx,
dom.dbx, and symbols.dbx. Be sure to look in the lucene directory for
any .lock files, and delete them too.
4. Start eXist again. Assuming that cleared the block, you're all set
- just trigger a reindex to rebuild your indexes.
-- Roy
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Exist-open mailing list
https://lists.sourceforge.net/lists/listinfo/exist-open
--
David Finton
Wolfgang Meier
2011-06-15 08:30:48 UTC
Permalink
Post by Dave Finton
One thing that I learned the hard way is that if I were to do a massive
update on an indexed data set, it's a good idea to turn off lucene indexing
temporarily (I accomplished this by removing the collection.xconf file from
/db/system/config).
It is true that creating the lucene index may account for 50% or more
of the time it takes to store or update a document. I will soon update
to lucene 3, which helps to reduce indexing times.

But again, I wonder if you define lucene indexes on too many nodes?
Normally, you only need a lucene index on higher level elements like
div's, head's, sometimes on paragraphs, but you should avoid defining
indexes on nested elements.

Wolfgang
Dave Finton
2011-06-15 14:33:35 UTC
Permalink
Sorry about the duplicate email, Wolfgang. I'm resending this because I
forgot to hit "Reply All" last time.

In our index, we use the following pattern:

*<text match="//sub_node_name/*"/>*

The structure of the document in question is a single file in a collection,
where looks like the following:

*<items>

<sub_node_name>
<data1>some-value</data1>
<data2>some-other-value</data2>
<data3>hello</data3>
<data4>world</data4>
<data5/>
<data6>the-last-value</data6>
</sub_node_name>

<sub_node_name>
... (same structure as above, with different text values in each of the
data nodes) ...
</sub_node_name>

... (repeat about a thousand to three thousand times) ...

</items>*

I think you may be on to something, Wolfgang. What I should have is the
following lucene indexing configuration:

*<text match="/items/sub_node_name"/>*
Dave Finton
2011-06-15 17:42:10 UTC
Permalink
I've confirmed an improvement. I now just simply index the top-level node
with:

*<text match="/*"/>*

In our collections, that matches exactly one node each (there's only one XML
document per collection in there). The re-index operation takes a
significantly shorter period of time now for our larger data sets.
Post by Dave Finton
Sorry about the duplicate email, Wolfgang. I'm resending this because I
forgot to hit "Reply All" last time.
*<text match="//sub_node_name/*"/>*
The structure of the document in question is a single file in a collection,
*<items>
<sub_node_name>
<data1>some-value</data1>
<data2>some-other-value</data2>
<data3>hello</data3>
<data4>world</data4>
<data5/>
<data6>the-last-value</data6>
</sub_node_name>
<sub_node_name>
... (same structure as above, with different text values in each of the
data nodes) ...
</sub_node_name>
... (repeat about a thousand to three thousand times) ...
</items>*
I think you may be on to something, Wolfgang. What I should have is the
*<text match="/items/sub_node_name"/>*
Wolfgang Meier
2011-06-15 08:23:50 UTC
Permalink
Hi,
Post by Legend of the Dragoon
I believe the cause of the problem is shutting down the server whenever Lucene is indexing something, which is alot.
The Lucene index sits in eXist's indexing pipeline and receives events
as documents are stored. The shutdown should actually wait until this
process has completed. If you kill it, you may end up with incomplete
indexes as well as documents.
Post by Legend of the Dragoon
4. Removing the data/lucene/write.lock lets the database come back up, but it comes up in a very unhappy place.
The application works, queries work, etc... however, ANY query that relies on an index (lucene or normal <qname>) fail to execute!  In fact, from the recent testing, even simple queries using collection()/xpath fail too.
This would indicate that it's not only lucene which has problems
accessing the indexes! The crash must have been rather fatal.

In the first place, I wonder why the db was killed the db at all?
Maybe you should re-check your index configuration. Creating lucene
indexes on too many nodes can lead to lucene using up too much memory
and eventually killing the entire db. In one case, users created a
lucene index on //*, which led to a 900mb index file for 9mb of data.
Creating such a nested index on every node normally does not make any
sense. You want to define the index on the top-level elements you plan
to query - everything below will be included automatically.

Wolfgang
Legend of the Dragoon
2011-06-15 12:54:05 UTC
Permalink
Wolfgang,

We actually only use one lucene index, on a top level document down one of our main data collections. But yes, its a pretty sizeable collection.

In this case, the reason the server shutdown is that I Stopped the Windows Service (Java wrapper in tools/wrapper) while the server was busy doing something. In other cases, we have seen the server keel for different reasons (JVM heap size) and had a similar issue with the lucene.lock.

The wrapper service does seem to eventually force kill the eXist process if you tell it to stop, even if a Xquery is running.

Any advice on that?

Thanks,
Lothy
Subject: Re: [Exist-open] Lucene.lock after database crash - causes all indexes to fail (lothy)
Date: Wednesday, June 15, 2011, 4:30 AM
Post by Dave Finton
One thing that I learned the
hard way is that if I were to do a massive
Post by Dave Finton
update on an indexed data set, it's a good idea to
turn off lucene indexing
Post by Dave Finton
temporarily (I accomplished this by removing the
collection.xconf file from
Post by Dave Finton
/db/system/config).
It is true that creating the lucene index may account for
50% or more
of the time it takes to store or update a document. I will
soon update
to lucene 3, which helps to reduce indexing times.
But again, I wonder if you define lucene indexes on too
many nodes?
Normally, you only need a lucene index on higher level
elements like
div's, head's, sometimes on paragraphs, but you should
avoid defining
indexes on nested elements.
Wolfgang
Loading...