Increasing memory usage on writes #1618

cmelchior · 2023-12-29T14:57:04Z

It looks like we might have a memory leak somewhere involving writes. A simple loop like this running on Android shows a small continuous increase in memory usage

viewModelScope.launch {
   while(true) {
       realm.write {
           copyToRealm(Sample().apply { stringField = Utils.createRandomString(1024*1024) }
       }
   }
}

It can take several minutes (if not more for it to show, so I am still not 100% sure there is a leak, but we have a customer is running into crashes after several hours with a loop that writes 25 times pr. second)

The text was updated successfully, but these errors were encountered:

sync-by-unito · 2024-01-08T12:59:30Z

➤ rorbech commented:

This is public variant of HELP-53315 to investigate potential memory leak on writes ... though we cannot currently replicate it.

cmelchior · 2024-01-22T12:16:27Z

This seems to replicate it on And Pixel 5 (API 33) emulator:

  GlobalScope.launch(Dispatchers.Default) {
    while(true) {
      val p: Float = Random.nextFloat()
      val now = Instant.now()

      viewModelScope.launch(Dispatchers.IO) {
        kotlin.runCatching {
          realm.write {
            copyToRealm(Pressure().apply {
              timestamp = now.toEpochMilli()
              hPa = p
            })
          }
        }.onFailure { Log.e("TAG", it.stackTraceToString(), it) }
      }

      val frequency =
        if (lastTime != now)
          (1000.0 / (between(lastTime, now).toMillis())).roundToInt()
        else
          0

      lastTime = now

      _text.postValue(
        "${
          startTime.atZone(ZoneId.systemDefault()).toLocalDateTime()
            .truncatedTo(ChronoUnit.SECONDS)
        }\n" +
                "${between(startTime, now).toHoursMinutesSecondsShort()}\n" +
                "${"%.4f".format(p)}\n" +
                "$frequency Hz")
    }
  }

class Pressure : RealmObject
{
  @PrimaryKey
  var _id: ObjectId = ObjectId()
  var outingId: ObjectId? = null
  @Index
  var timestamp: Long = 0L
  var hPa: Float = 0.0F
  var hPa0: Float? = null
  var hPa0Observed: Boolean? = null
  var slope: Int? = null
  var accuracy: Int? = null

  val meters get() = hPaToMeters(hPa, hPa0)
  val feet get() = metersToFeet(meters)

  val timeString get() = timestamp.toLocalTimeString()
  val dateString get() = timestamp.toLocalDateString()

  var time: Instant
    get() = Instant.ofEpochMilli(timestamp)
    set(value)
    {
      timestamp = value.toEpochMilli()
    }
}

At least it shows an increase in the Other memory region that does not come down again

cmelchior · 2024-01-22T12:43:41Z

After more testing, it seems to be related to the number of active versions:

Modifying the code to

  GlobalScope.launch(Dispatchers.Default) {
    while(true) {
      val p: Float = Random.nextFloat()
      val now = Instant.now()
      viewModelScope.launch(Dispatchers.IO) {
          realm.write<Unit> {
            copyToRealm(Pressure().apply {
              timestamp = now.toEpochMilli()
              hPa = p
            })
          }
      }
      _text.postValue("Version: ${realm.getNumberOfActiveVersions()}")
    }
  }

Show that the number of active versions keep increasing even though nothing should hold on to them. My best guess is that that it is our internal GC that isn't fast enough to keep up with the massive number of references being created.

cmelchior · 2024-01-22T12:48:10Z

Surprisingly, the above run actually eventually seemed to catch up:

But then crashed because it went OOM:

 java.lang.OutOfMemoryError: Failed to allocate a 24 byte allocation with 1744152 free bytes and 1703KB until OOM, target footprint 201326592, growth limit 201326592; failed due to fragmentation (largest possible contiguous allocation 0 bytes). Number of 256KB sized free regions are: 0
                                                                                                    	at com.oliverclimbs.realmtest.ui.home.HomeViewModel$1$1.invokeSuspend(HomeViewModel.kt:67)
                                                                                                    	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
                                                                                                    	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
                                                                                                    	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
                                                                                                    	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:100)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
                                                                                                    	Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@85dc4cc, Dispatchers.IO]

cmelchior · 2024-01-25T09:52:16Z

After looking more into this, it looks like this behavior can be explained by our RealmFinalizer thread either not keeping up or pointers simply not being GC'ed in time.

By adding an atomic counter to the finalizer, I was able to show that the reference queue keeps growing with the following code:

    GlobalScope.launch(Dispatchers.Default) {
      while(true) {
        val p: Float = Random.nextFloat()
        val now = Instant.now()
        viewModelScope.launch(Dispatchers.IO) {
            realm.write<Unit> {
              copyToRealm(Pressure().apply {
                timestamp = now.toEpochMilli()
                hPa = p
              })
            }
        }
        withContext(Dispatchers.Main) {
          _text.postValue("Version: ${realm.getNumberOfActiveVersions()}")
        }
      }
    }

I could see incremental bursts of things being GC'ed, but the overall trend was that the queue kept growing and growing.... just pausing the writes didn't help either. My guess there is that the memory allocator didn't consider the many thousand NativePointers important enough to GC.

The result of this would either be that 1) We ran out of Disk space because the Realm file kept growing (because of unclaimed versions) or 2) We went OOM because we exhausted the native memory space.

Only by stopping the writes and then manually calling the GC was I able to empty the queue.

This is not ideal in "fast write"-scenarios like listening to sensor updates.

I tried to modify our GC thread to have Max priority. This seemed to help a little bit, but the queue of pointers was still growing.

So right now I guess that for these scenarios, we need some sort of "allocation-free" insert, or at least an insert that automatically cleans up as soon as the write is completed.

In Realm Java we have a bulk insert method called insert() that does this, and it is tracked for Kotlin here: #959. I would guess this kind of method would also fix the problem described in this issue.

cmelchior · 2024-01-29T10:09:36Z

A solution to this problem is most likely something like: #959

OluwoleOyetoke · 2024-05-20T15:52:12Z

Any chance fixing this issue will be prioritized soon? @cmelchior , Did you figure out any workaround in the meantime?

santaevpavel · 2024-10-21T08:00:20Z

Hi, we moved to Realm Kotlin from Realm Java in our project and we see 10-30% increase in memory usage.
Also I checked memory usage in a test app and I observed similar memory increase 10-30% depending on the use case. For example, this use case seems to indicate a memory leak:

Open Realm
Observe ~5000 entities with 100 queries
Write 1000 entities
Close Realm
Close the app
Trigger GC few times from profiler

This usecase shows that native memory size doesn't decrease after closing Realm and triggering GC.
Additionally, I profiled native memory allocations using Perfetto (heapprofd) (https://perfetto.dev/docs/case-studies/memory#heapprofd). It showed 4 MB memory leaks in realm_results_count, realm_results_resolve_in. I’m not sure what might be causing the leak in realm_results_count.
It also seems calling toList() on RealmResult<> increase memory leakage. Calling toList() invokes RealmResultsImpl::size matching size of RealmResult (see jvm implemention of toList() https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/jvm/runtime/kotlin/jvm/internal/CollectionToArray.kt#L72).
Also no leaks are visible in java/kotlin heap.

Here is a test project I used for profiling: https://github.com/santaevpavel/realm-benchmark

Realm version: 2.3.0 and 3.0.0
Kotlin version: 2.0.20
AGP version: 8.3.2

santaevpavel · 2024-10-21T19:36:58Z

I also tried to figure out what makes memory usage high when toList() is called on RealmResults.

Calling size() on RealmResultsImpl invokes native function realm_results_count and passes pointer to Results (from realm core) with query mode (m_mode == Mode::Query).
realm_results_count invokes Result::size() which invokes Results::do_size().
do_size() calls m_query.count(m_descriptor_ordering).
m_query.count() creates TableView and calls apply_descriptor_ordering and size().
TableView::apply_descriptor_ordering() calls TableView::do_sync which iterates through all entities and fills m_key_values.

So iterating RealmResults with N elements (or calling public fun <T> Iterable<T>.toList(): List<T> on RealmResults) calls size() N times (iterator of AbstractList calls size() on every hasNext() call) that iterates over elements N*N times (or calls do_sync() N times that is not looks ok). It may explain why we see memory usage peaks if toList() is called.

I wasn't able to debug it as native calls can't be debugged without sources. Also I tried to run android tests in this repository, but it didn't work. So basically it's just my thoughts. @nhachicha could you check it or confirm that the problem that I described is correct?

santaevpavel · 2024-11-04T06:53:12Z

I've found what's is leaking. It's happening in JNI layer that is generated by SWIG. Array of longs are not released. I've already found a fix. I will open a PR.

sync-by-unito bot assigned nhachicha Jul 16, 2024

santaevpavel mentioned this issue Nov 12, 2024

draft: Memory leak fix in JVM #1854

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing memory usage on writes #1618

Increasing memory usage on writes #1618

cmelchior commented Dec 29, 2023 •

edited

Loading

sync-by-unito bot commented Jan 8, 2024

cmelchior commented Jan 22, 2024 •

edited

Loading

cmelchior commented Jan 22, 2024

cmelchior commented Jan 22, 2024

cmelchior commented Jan 25, 2024

cmelchior commented Jan 29, 2024

OluwoleOyetoke commented May 20, 2024

santaevpavel commented Oct 21, 2024

santaevpavel commented Oct 21, 2024

santaevpavel commented Nov 4, 2024

Increasing memory usage on writes #1618

Increasing memory usage on writes #1618

Comments

cmelchior commented Dec 29, 2023 • edited Loading

sync-by-unito bot commented Jan 8, 2024

cmelchior commented Jan 22, 2024 • edited Loading

cmelchior commented Jan 22, 2024

cmelchior commented Jan 22, 2024

cmelchior commented Jan 25, 2024

cmelchior commented Jan 29, 2024

OluwoleOyetoke commented May 20, 2024

santaevpavel commented Oct 21, 2024

santaevpavel commented Oct 21, 2024

santaevpavel commented Nov 4, 2024

cmelchior commented Dec 29, 2023 •

edited

Loading

cmelchior commented Jan 22, 2024 •

edited

Loading