Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing memory usage on writes #1618

Open
cmelchior opened this issue Dec 29, 2023 · 10 comments
Open

Increasing memory usage on writes #1618

cmelchior opened this issue Dec 29, 2023 · 10 comments
Assignees

Comments

@cmelchior
Copy link
Contributor

cmelchior commented Dec 29, 2023

It looks like we might have a memory leak somewhere involving writes. A simple loop like this running on Android shows a small continuous increase in memory usage

viewModelScope.launch {
   while(true) {
       realm.write {
           copyToRealm(Sample().apply { stringField = Utils.createRandomString(1024*1024) }
       }
   }
}

It can take several minutes (if not more for it to show, so I am still not 100% sure there is a leak, but we have a customer is running into crashes after several hours with a loop that writes 25 times pr. second)

Copy link

sync-by-unito bot commented Jan 8, 2024

➤ rorbech commented:

This is public variant of HELP-53315 to investigate potential memory leak on writes ... though we cannot currently replicate it.

@cmelchior
Copy link
Contributor Author

cmelchior commented Jan 22, 2024

This seems to replicate it on And Pixel 5 (API 33) emulator:

  GlobalScope.launch(Dispatchers.Default) {
    while(true) {
      val p: Float = Random.nextFloat()
      val now = Instant.now()

      viewModelScope.launch(Dispatchers.IO) {
        kotlin.runCatching {
          realm.write {
            copyToRealm(Pressure().apply {
              timestamp = now.toEpochMilli()
              hPa = p
            })
          }
        }.onFailure { Log.e("TAG", it.stackTraceToString(), it) }
      }

      val frequency =
        if (lastTime != now)
          (1000.0 / (between(lastTime, now).toMillis())).roundToInt()
        else
          0

      lastTime = now

      _text.postValue(
        "${
          startTime.atZone(ZoneId.systemDefault()).toLocalDateTime()
            .truncatedTo(ChronoUnit.SECONDS)
        }\n" +
                "${between(startTime, now).toHoursMinutesSecondsShort()}\n" +
                "${"%.4f".format(p)}\n" +
                "$frequency Hz")
    }
  }
class Pressure : RealmObject
{
  @PrimaryKey
  var _id: ObjectId = ObjectId()
  var outingId: ObjectId? = null
  @Index
  var timestamp: Long = 0L
  var hPa: Float = 0.0F
  var hPa0: Float? = null
  var hPa0Observed: Boolean? = null
  var slope: Int? = null
  var accuracy: Int? = null

  val meters get() = hPaToMeters(hPa, hPa0)
  val feet get() = metersToFeet(meters)

  val timeString get() = timestamp.toLocalTimeString()
  val dateString get() = timestamp.toLocalDateString()

  var time: Instant
    get() = Instant.ofEpochMilli(timestamp)
    set(value)
    {
      timestamp = value.toEpochMilli()
    }
}

At least it shows an increase in the Other memory region that does not come down again

@cmelchior
Copy link
Contributor Author

After more testing, it seems to be related to the number of active versions:

image

Modifying the code to

  GlobalScope.launch(Dispatchers.Default) {
    while(true) {
      val p: Float = Random.nextFloat()
      val now = Instant.now()
      viewModelScope.launch(Dispatchers.IO) {
          realm.write<Unit> {
            copyToRealm(Pressure().apply {
              timestamp = now.toEpochMilli()
              hPa = p
            })
          }
      }
      _text.postValue("Version: ${realm.getNumberOfActiveVersions()}")
    }
  }

Show that the number of active versions keep increasing even though nothing should hold on to them. My best guess is that that it is our internal GC that isn't fast enough to keep up with the massive number of references being created.

@cmelchior
Copy link
Contributor Author

Surprisingly, the above run actually eventually seemed to catch up:

image

But then crashed because it went OOM:

 java.lang.OutOfMemoryError: Failed to allocate a 24 byte allocation with 1744152 free bytes and 1703KB until OOM, target footprint 201326592, growth limit 201326592; failed due to fragmentation (largest possible contiguous allocation 0 bytes). Number of 256KB sized free regions are: 0
                                                                                                    	at com.oliverclimbs.realmtest.ui.home.HomeViewModel$1$1.invokeSuspend(HomeViewModel.kt:67)
                                                                                                    	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
                                                                                                    	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
                                                                                                    	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
                                                                                                    	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:100)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
                                                                                                    	Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@85dc4cc, Dispatchers.IO]

@cmelchior
Copy link
Contributor Author

After looking more into this, it looks like this behavior can be explained by our RealmFinalizer thread either not keeping up or pointers simply not being GC'ed in time.

By adding an atomic counter to the finalizer, I was able to show that the reference queue keeps growing with the following code:

    GlobalScope.launch(Dispatchers.Default) {
      while(true) {
        val p: Float = Random.nextFloat()
        val now = Instant.now()
        viewModelScope.launch(Dispatchers.IO) {
            realm.write<Unit> {
              copyToRealm(Pressure().apply {
                timestamp = now.toEpochMilli()
                hPa = p
              })
            }
        }
        withContext(Dispatchers.Main) {
          _text.postValue("Version: ${realm.getNumberOfActiveVersions()}")
        }
      }
    }

I could see incremental bursts of things being GC'ed, but the overall trend was that the queue kept growing and growing.... just pausing the writes didn't help either. My guess there is that the memory allocator didn't consider the many thousand NativePointers important enough to GC.

The result of this would either be that 1) We ran out of Disk space because the Realm file kept growing (because of unclaimed versions) or 2) We went OOM because we exhausted the native memory space.

Only by stopping the writes and then manually calling the GC was I able to empty the queue.

This is not ideal in "fast write"-scenarios like listening to sensor updates.

I tried to modify our GC thread to have Max priority. This seemed to help a little bit, but the queue of pointers was still growing.

So right now I guess that for these scenarios, we need some sort of "allocation-free" insert, or at least an insert that automatically cleans up as soon as the write is completed.

In Realm Java we have a bulk insert method called insert() that does this, and it is tracked for Kotlin here: #959. I would guess this kind of method would also fix the problem described in this issue.

@cmelchior
Copy link
Contributor Author

A solution to this problem is most likely something like: #959

@OluwoleOyetoke
Copy link

Any chance fixing this issue will be prioritized soon? @cmelchior , Did you figure out any workaround in the meantime?

@santaevpavel
Copy link

Hi, we moved to Realm Kotlin from Realm Java in our project and we see 10-30% increase in memory usage.
Also I checked memory usage in a test app and I observed similar memory increase 10-30% depending on the use case. For example, this use case seems to indicate a memory leak:

  • Open Realm
  • Observe ~5000 entities with 100 queries
  • Write 1000 entities
  • Close Realm
  • Close the app
  • Trigger GC few times from profiler

This usecase shows that native memory size doesn't decrease after closing Realm and triggering GC.
Additionally, I profiled native memory allocations using Perfetto (heapprofd) (https://perfetto.dev/docs/case-studies/memory#heapprofd). It showed 4 MB memory leaks in realm_results_count, realm_results_resolve_in. I’m not sure what might be causing the leak in realm_results_count.
It also seems calling toList() on RealmResult<> increase memory leakage. Calling toList() invokes RealmResultsImpl::size matching size of RealmResult (see jvm implemention of toList() https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/jvm/runtime/kotlin/jvm/internal/CollectionToArray.kt#L72).
Also no leaks are visible in java/kotlin heap.

Here is a test project I used for profiling: https://github.com/santaevpavel/realm-benchmark

Realm version: 2.3.0 and 3.0.0
Kotlin version: 2.0.20
AGP version: 8.3.2

image
image

@santaevpavel
Copy link

I also tried to figure out what makes memory usage high when toList() is called on RealmResults.

  • Calling size() on RealmResultsImpl invokes native function realm_results_count and passes pointer to Results (from realm core) with query mode (m_mode == Mode::Query).
  • realm_results_count invokes Result::size() which invokes Results::do_size().
  • do_size() calls m_query.count(m_descriptor_ordering).
  • m_query.count() creates TableView and calls apply_descriptor_ordering and size().
  • TableView::apply_descriptor_ordering() calls TableView::do_sync which iterates through all entities and fills m_key_values.

So iterating RealmResults with N elements (or calling public fun <T> Iterable<T>.toList(): List<T> on RealmResults) calls size() N times (iterator of AbstractList calls size() on every hasNext() call) that iterates over elements N*N times (or calls do_sync() N times that is not looks ok). It may explain why we see memory usage peaks if toList() is called.

I wasn't able to debug it as native calls can't be debugged without sources. Also I tried to run android tests in this repository, but it didn't work. So basically it's just my thoughts. @nhachicha could you check it or confirm that the problem that I described is correct?

@santaevpavel
Copy link

I've found what's is leaking. It's happening in JNI layer that is generated by SWIG. Array of longs are not released. I've already found a fix. I will open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants