[BUG] Getting into a rare and unrecoverable state after 5.0.21 #2526

hosadoya · 2024-07-28T08:41:21Z

Version
Which LiteDB version/OS/.NET framework version are you using. (REQUIRED)
5.0.21/Windows/.net 6

Describe the bug
A clear and concise description of what the bug is.
After updating the NuGet to 5.0.21 version, getting randomly into an unrecoverable state. Here are some of the logs in sequence:

The very first and only unique error:

LiteDB.LiteException: This transaction are invalid state
   at LiteDB.Engine.QueryExecutor.<>c__DisplayClass12_0.<<ExecuteQuery>g__RunQuery|2>d.MoveNext()
   at LiteDB.Utils.Extensions.EnumerableExtensions.OnDispose[T](IEnumerable`1 source, Action onDispose)+MoveNext()
   at LiteDB.Utils.Extensions.EnumerableExtensions.OnDispose[T](IEnumerable`1 source, Action onDispose)+MoveNext()
   at LiteDB.BsonDataReader..ctor(IEnumerable`1 values, String collection, EngineState state)
   at LiteDB.Engine.QueryExecutor.ExecuteQuery(Boolean executionPlan)
   at LiteDB.Engine.LiteEngine.Query(String collection, Query query)
   at LiteDB.LiteQueryable`1.ToDocuments()+MoveNext()
   at System.Linq.Enumerable.SelectEnumerableIterator`2.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

After this getting 94 of those errors from different threads:

LiteDB.LiteException: pages in memory store must be non-shared
   at LiteDB.BsonDataReader.Read()
   at LiteDB.LiteQueryable`1.ToDocuments()+MoveNext()
   at System.Linq.Enumerable.SelectEnumerableIterator`2.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

After that getting thousands of those errors:

LiteDB.LiteException: Maximum number of transactions reached
   at LiteDB.Engine.TransactionMonitor.GetTransaction(Boolean create, Boolean queryOnly, Boolean& isNew)
   at LiteDB.Engine.QueryExecutor.ExecuteQuery(Boolean executionPlan)
   at LiteDB.Engine.LiteEngine.Query(String collection, Query query)
   at LiteDB.LiteQueryable`1.ToDocuments()+MoveNext()
   at System.Linq.Enumerable.SelectEnumerableIterator`2.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

This already happened 3 times. The only way to recover was to restart the process.

Code to Reproduce
Write a small snippet to isolate your bug and could be possible to our team test. (REQUIRED)

This seems to be related to the recent OnDispose change as it is where the issue starts. No repro as this is very random error happening few times per week. All we got is detail logs shown above.

Expected behavior
A clear and concise description of what you expected to happen.

No error/corruption.

Screenshots/Stacktrace
If applicable, add screenshots/stacktrace

NA

Additional context
Add any other context about the problem here.

App is running many (max 20) concurrent tasks in parallel which CRUD into the LiteDB in varying ways.

The text was updated successfully, but these errors were encountered:

flier268 · 2024-07-30T06:04:12Z

@hosadoya You can downgrade to 5.0.17
mlockett42/litedb-async#34

RichardVogelij · 2024-09-23T04:56:49Z

5.0.21 / .net core 7.

This happens also to us about twice a week in a usage where many concurrent tasks CRUD the LiteDB database.

Downgrading is not an option as 5.0.21 fixes a (for us) common occurrence of "empty page must be defined as empty type" cases.

I regretfully do not have the issue reproducible - the cause seems random, but it seems to be more prevalent when inserting some data into an audit-log table in a background task.

I have not seen the first occurrence of the "This transaction are invalid state" - but I am not 100% certain it did not occur. I do however see a lot of the "pages in memory store must be non-shared" starting all of a sudden.

When this happens the entire LiteDB connection becomes unusable, first only getting "pages in memory store must be non-shared" and eventually getting nothing but errors complaining about "Maximum number of transactions reached
" on all subsequent queries.

Thankfully no data seems to be corrupted and restarting the app fixes the issue.

Since there seems to be no solution yet I'm thinking about detecting this particular issue and destroying+recreating the LiteDB connection as temporary work around - but would like to urge the devs of LiteDB to take this one seriously;

Is there any other suggestion other than downgrading perhaps?

megedsh · 2024-10-20T11:42:19Z

I had this issue in my application as well.
but it turrned out to be an issue with the way I shared the connection between all the threads doing the reading/writing work.
Not saying that this what is happening in your app, but it is worth a look.

before the error started, I changed the way my worker threads were getting the Database instance.
From each thread creating a new instance - to a shared instance from a singelton provider.

in some places - there was still a 'using' pattern for the Database instance
When a 'using' clause is closed, the target object gets disposed.
I.E one thread was disposing the connection, while others were in the middle of using it.

The "This transaction are invalid state" exception did not point me in that direction, But I did do some debugging and removed some worker threads, and all of the sudden I got the more informative 'Object disposed' exception.

After fixing the issue - by not disposing my shared database instance - the original "This transaction are invalid state" exception did not return.

have a look at your app and see if anybody is disposing a shared database instance in one of your worker threads.
Hope this helps.

RichardVogelij · 2024-10-21T06:25:31Z

I had this issue in my application as well. but it turrned out to be an issue with the way I shared the connection between all the threads doing the reading/writing work. Not saying that this what is happening in your app, but it is worth a look.

before the error started, I changed the way my worker threads were getting the Database instance. From each thread creating a new instance - to a shared instance from a singelton provider.

in some places - there was still a 'using' pattern for the Database instance When a 'using' clause is closed, the target object gets disposed. I.E one thread was disposing the connection, while others were in the middle of using it.

The "This transaction are invalid state" exception did not point me in that direction, But I did do some debugging and removed some worker threads, and all of the sudden I got the more informative 'Object disposed' exception.

After fixing the issue - by not disposing my shared database instance - the original "This transaction are invalid state" exception did not return.

have a look at your app and see if anybody is disposing a shared database instance in one of your worker threads. Hope this helps.

Thanks for your two cents - regretfully I'm already doing that (sharing a single instance) - I have not figured out the exact cause, but something is off. In the mean time I have been on a hunting spree and have settled on materializing all loops and never re-using a GetCollection after an await when doing async things which appears to have been a "solution" for us (it happens less frequently.)

Regretfully there still seems to be a fundamental issue in LiteDB5 when using tasks. The read-lock mechanism uses the ManagedThreadID as key, which in the async/await world can be the same for multiple Task contexts - this is where sometimes things go bad. Extremely hard to solve because our app consists of a bunch of API/Controllers and heavily relies on async/await - I have tried to fix this in a fork of litedb but gave up (I got the problem fixed by also taking task context id into account and implementing an alterantive to ReaderWriterLockSlim to use Semaphores - with almost all tests passing - but seemed to get random failing tests when executing tests in parallel. So I must have missed something; I'm not familiar enough with LiteDB's code base and time constraints led me to give up for now; Note: there even is a test case in (disabled) in LiteDB's code base referencing Parallel.Foreach which also seems a root cause of these "random" occurences where the entire LiteDB instance goed brrr) Currently I was forced to do some lame try/catch and look for specific errors and destroy/recreate the LiteDB instance which at least mitigates this issue a bit for us.

hosadoya added the bug label Jul 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Getting into a rare and unrecoverable state after 5.0.21 #2526

[BUG] Getting into a rare and unrecoverable state after 5.0.21 #2526

hosadoya commented Jul 28, 2024 •

edited

Loading

flier268 commented Jul 30, 2024

RichardVogelij commented Sep 23, 2024 •

edited

Loading

megedsh commented Oct 20, 2024

RichardVogelij commented Oct 21, 2024

[BUG] Getting into a rare and unrecoverable state after 5.0.21 #2526

[BUG] Getting into a rare and unrecoverable state after 5.0.21 #2526

Comments

hosadoya commented Jul 28, 2024 • edited Loading

flier268 commented Jul 30, 2024

RichardVogelij commented Sep 23, 2024 • edited Loading

megedsh commented Oct 20, 2024

RichardVogelij commented Oct 21, 2024

hosadoya commented Jul 28, 2024 •

edited

Loading

RichardVogelij commented Sep 23, 2024 •

edited

Loading